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Preface 


Quantum computing is a beautiful combination of quantum physics, computer science, and infor¬ 
mation theory. The purpose of this book is to make this exciting research area accessible to a 
broad audience. In particular, we endeavor to help the reader bridge the conceptual and notational 
barriers that separate quantum computing from conventional computing. 

The book is concerned with theory: what changes when the classical model underpinning 
conventional computing is replaced with a quantum one. It contains only a brief discussion of 
the ongoing efforts to build quantum computers, an active area which is still so young that it is 
impossible even for experts to predict which approaches will be most successful. While this book 
is about theory, it is important to ground the discussion of quantum computation in the physics that 
motivates it. For this reason, the text includes discussions of quantum physics and experiments 
that illustrate why the theory is defined the way it is. 

We precisely define concepts used in quantum computation and emphasize subtle distinc¬ 
tions. This rigor is motivated in part by our experience working with members of the joint 
FXPAL'/PARC 2 reading group and with reviewing papers by authors new to the field. Mistakes 
commonly arise due to a lack of precision. For example, we take care to distinguish a quantum 
state from a vector that represents it. We make clear which notions are basis dependent (e.g., 
superposition) and which are not (e.g., entanglement), and emphasize the dependence of certain 
notions (e.g., entanglement) on a particular tensor decomposition. The distinction between tensor 
decompositions and direct sum decompositions, both used extensively in quantum mechanics, 
is discussed explicitly in both quantum mechanical and classical probabilistic settings. Defini¬ 
tions are carefully motivated. For example, instead of starting with axioms for density operators 
or mixed states, the definitions of these concepts are motivated by a discussion of what can be 
deduced about a subsystem from measurements of the subsystem alone. 

One advantage of dealing only with theory, and not with the efforts to build quantum computers, 
is that the amount of quantum physics and supporting mathematics needed is reduced. We are 
able to develop all of the necessary quantum mechanics within the book; no previous exposure to 
quantum physics is required. We give careful and precise descriptions of fundamental concepts— 
such as quantum state spaces, quantum measurement, and entanglement—before covering the 
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standard quantum algorithms and other quantum information processing tasks such as quantum 
key distribution and quantum teleportation. 

The intent of this book is to make quantum computing accessible to a wide audience of computer 
scientists, engineers, mathematicians, and anyone with a general interest in the subject who knows 
sufficient mathematics. Basic concepts from college-level linear algebra such as vector spaces, 
linear transformations, eigenvalues, and eigenvectors are used throughout the book. A few sections 
require more mathematics; familiarity with group theory is required for sections 8.6.1 and 8.6.2, 
appendix B, and much of chapter 11. Group theory is reviewed in boxes, but readers who have 
never seen group theory should consult a book on the subject or skip those sections. 

While we hope our book lives up to the gentle of its title, reading it will require effort. Many of 
the concepts are subtle and unintuitive, and much of the notation unfamiliar. Readers will need to 
spend time working with the concepts and notations to develop a level of fluency at each stage. 
For example, even readers with significant mathematical background may not have worked much 
with tensor products and may not be familiar with the relation of tensor product spaces to their 
component spaces. The early chapters of the book develop these notions carefully, since they are 
absolutely fundamental to quantum information processing. It is well worth the effort to master 
them, as well as the concise Dirac notation in which they are generally expressed, but mastery will 
require effort. The precise nature of these mathematical formalisms provides a means of working 
with quantum concepts before fully understanding them. Intuition for quantum mechanics and 
quantum information processing will develop from playing with the formal mathematics. 

The book emphasizes features of quantum mechanics that give quantum computation its power 
and are responsible for its limitations. Neither the extent of the power of quantum computation 
nor its limitations have been fully understood. Research challenges remain not only in build¬ 
ing quantum computers and developing novel algorithms and protocols, but also in answering 
fundamental questions as to the source of quantum computing’s power and the reasons for its 
limitations. This book examines what is known about what quantum computers can and cannot 
do, and also explores what is known about why. 

The focus on the reasons underlying quantum computing’s effectiveness results in the inclusion 
of topics frequently left out of other expositions of the subject. For example, one theme of the 
book is the relationship of quantum information processing to probability. That many quantum 
algorithms are nonprobabilistic is emphasized. A section is devoted to modifications of Grover’s 
original algorithm that preserve the speed-up but return a solution with certainty. On the other 
hand, the strong formal resemblance between quantum theory and probability theory is described 
in detail and distinctions are highlighted, illuminating, for example, how entanglement differs 
from correlation, and the difference between a superposition and a mixture. 

As another example, while quantum entanglement is the most common explanation given for 
why quantum information processing works, multipartite entanglement remains poorly under¬ 
stood. Bipartite entanglement is much better understood but has limited use for understanding 
quantum computation. The book includes sections on multipartite entanglement, a topic often left 
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out of introductory books, and discusses bipartite entanglement. Discussions of multipartite 
entanglement require examples, which made it natural to include a section on cluster states, 
the fundamental entanglement resource used for cluster state, or one-way, quantum computation. 
Cluster state quantum computation and adiabatic quantum computation, two alternatives to the 
standard circuit model, are briefly introduced and their strengths and applications discussed. 

As a final example, while the conversion between general classical circuits and reversible 
classical circuits is a purely classical topic, it is the heart of the proof that anything a classical 
computer can do, a quantum computers can do with comparable efficiency. For this reason, the 
book includes a detailed account of this piece of classical, but nonstandard, computer science. 

This is not a book about quantum mechanics. We treat quantum mechanics as an abstract 
mathematical theory and consider the physical aspects only to elucidate theoretical concepts. We 
do not discuss issues of interpretation of quantum mechanics; the occasional use of terms such 
as quantum parallelism, for example, is not to be construed as an endorsement of one or another 
particular interpretation. 
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Introduction 


In the last decades of the twentieth century, scientists sought to combine two of the century’s most 
influential and revolutionary theories: information theory and quantum mechanics. Their success 
gave rise to a new view of computation and information. This new view, quantum information 
theory, changed forever how computation, information, and their connections with physics are 
thought about, and it inspired novel applications, including some wildly different algorithms and 
protocols. This view and the applications it spawned are the subject of this book. 

Information theory, which includes the foundations of both computer science and communica¬ 
tions, abstracted away the physical world so effectively that it became possible to talk about the 
major issues within computer science and communications, such as the efficiency of an algorithm 
or the robustness of a communication protocol, without understanding details of the physical 
devices used for the computation or the communication. This ability to ignore the underlying 
physics proved extremely powerful, and its success can be seen in the ubiquity of the computing 
and communications devices around us. The abstraction away from the physical had become such 
a part of the intellectual landscape that the assumptions behind it were almost forgotten. At its 
heart, until recently, information sciences have been firmly rooted in classical mechanics. For 
example, the Turing machine is a classical mechanical model that behaves according to purely 
classical mechanical principles. 

Quantum mechanics has played an ever-increasing role in the development of new and more 
efficient computing devices. Quantum mechanics underlies the working of traditional, classical 
computers and communication devices, from the transistor through the laser to the latest hardware 
advances that increase the speed and power and decrease the size of computer and communications 
components. Until recently, the influence of quantum mechanics remained confined to the low- 
level implementation realm; it had no effect on how computation or communication was thought 
of or studied. 

In the early 1980s, a few researchers realized that quantum mechanics had unanticipated impli¬ 
cations for information processing. Charles Bennett and Gilles Brassard, building on ideas of 
Stephen Wiesner, showed how nonclassical properties of quantum measurement provided a prov- 
ably secure mechanism for establishing a cryptographic key. Richard Feynman, Yuri Manin, 
and others recognized that certain quantum phenomena—phenomena associated with so-called 
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entangled particles—could not be simulated efficiently by a Turing machine. This observation 
led to speculation that perhaps these quantum phenomena could be used to speed up computa¬ 
tion in general. Such a program required rethinking the information theoretic model underlying 
computation, taking it out of the purely classical realm. 

Quantum information processing, a field that includes quantum computing, quantum cryptogra¬ 
phy, quantum communications, and quantum games, explores the implications of using quantum 
mechanics instead of classical mechanics to model information and its processing. Quantum com¬ 
puting is not about changing the physical substrate on which computation is done from classical 
to quantum, but rather changing the notion of computation itself. The change starts at the most 
basic level: the fundamental unit of computation is no longer the bit, but rather the quantum bit 
or qubit. Placing computation on a quantum mechanical foundation led to the discovery of faster 
algorithms, novel cryptographic mechanisms, and improved communication protocols. 

The phrase quantum computing does not parallel the phrases DNA computing or optical com¬ 
puting'. these describe the substrate on which computation is done without changing the notion 
of computation. Classical computers , the ones we all have on our desks, make use of quantum 
mechanics, but they compute using bits, not qubits. For this reason, they are not considered 
quantum computers. A quantum or classical computer may or may not be an optical computer, 
depending on whether optical devices are used to carry out the computation. Whether the com¬ 
puter is quantum or classical depends on whether the information is represented and manipulated 
in a quantum or classical way. The phrase quantum computing is closer in character to analog 
computing because the computational model for analog computing differs from that of standard 
computing: a continuum of values, rather than only a discrete set, is allowed. While the phrases 
are parallel, the two models differ greatly in that analog computation does not support entangle¬ 
ment, a key resource for quantum computation, and measurements of a quantum computer’s 
registers can yield only a small, discrete set of values. Furthermore, while a qubit can take on a 
continuum of values, in many ways a qubit resembles a bit, with its two discrete values, more 
than it does analog computation. For example, as we will see in section 4.3.1, only one bit’s worth 
of information can be extracted from a qubit by measurement. 

The field of quantum information processing developed slowly in the 1980s and early 1990s 
as a small group of researchers worked out a theory of quantum information and quantum infor¬ 
mation processing. David Deutsch developed a notion of a quantum mechanical Turing machine, 
Daniel Bernstein, Vijay Vazirani, and Andrew Yao improved upon his model and showed that 
a quantum Turing machine could simulate a classical Turing machine, and hence any classi¬ 
cal computation, with at most a polynomial time slowdown. The standard quantum circuit 
model was then defined, which led to an understanding of quantum complexity in terms of a 
set of basic quantum transformations called quantum gates. These gates are theoretical con¬ 
structs that may or may not have direct analogs in the physical components of an actual quantum 
computer. 

In the early 1990s, researchers developed the first truly quantum algorithms. In spite of the 
probabilistic nature of quantum mechanics, the first quantum algorithms, for which superiority 
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over classical algorithms could be proved, give the correct answer with certainty. They improve 
upon classical algorithms by solving in polynomial time with certainty a problem that can be 
solved in polynomial time only with high probability using classical techniques. Such a result is 
of no direct practical interest, since the impossibility of building a perfect machine reduces any 
practical machine running any algorithm to solving a problem only with high probability. But 
such results were of high theoretical interest, since they showed for the first time that quantum 
computation is theoretically more powerful than classical computation for certain computational 
problems. 

These results caught the interest of various researchers, including Peter Shor, who in 1994 sur¬ 
prised the world with his polynomial-time quantum algorithm for factoring integers. This result 
provided a solution to a well-studied problem of practical interest. A classical polynomial-time 
solution had long been sought, to the point where the world felt sufficiently confident that no 
such solution existed that many security protocols, including the widely used RSA algorithm, 
base their security entirely on the computational difficulty of this problem. It is unknown whether 
an efficient classical solution exists, so Shor’s result does not prove that quantum computers can 
solve a problem more efficiently than a classical computer. But even in the unlikely event that a 
polynomial-time classical algorithm is found for this problem, it would be an indication of the ele¬ 
gance and effectiveness of the quantum information theory point of view that a quantum algorithm, 
in spite of all the unintuitive aspects of quantum mechanics, was easier to find. 

While Shor’s result sparked a lot of interest in the field, doubts as to its practical significance 
remained. Quantum systems are notoriously fragile. Key properties, such as quantum entangle¬ 
ment, are easily disturbed by environmental influences that cause the quantum states to decohere. 
Properties of quantum mechanics, such as the impossibility of reliably copying an unknown 
quantum state, made it look unlikely that effective error-correction techniques for quantum compu¬ 
tation could ever be found. For these reasons, it seemed unlikely that reliable quantum computers 
could be built. 

Luckily, in spite of serious and widespread doubts as to whether quantum information process¬ 
ing could ever be practical, the theory itself proved so tantalizing that researchers continued to 
explore it. As a result, in 1996 Shor and Robert Calderbank, and independently Andrew Steane, 
saw a way to finesse the seemingly show-stopping problems of quantum mechanics to develop 
quantum error correction techniques. Today, quantum error correction is arguably the most mature 
area of quantum information processing. 

How practical quantum computing and quantum information will turn out is still unknown. No 
fundamental physical principles are known that prohibit the building of large-scale and reliable 
quantum computers. Engineering issues, however, remain. As of this writing, laboratory exper¬ 
iments have demonstrated quantum computations with several quantum bits performing dozens 
of quantum operations. Myriad promising approaches are being explored by theorists and exper¬ 
imentalists around the world, but much uncertainty remains as to how, when, or even whether, a 
quantum computer capable of carrying out general quantum computations on hundreds of qubits 
will be built. 



4 


1 Introduction 


Quantum computational approaches improve upon classical methods for a number of special¬ 
ized tasks. The extent of quantum computing’s applicability is still being determined. It does not 
provide efficient solutions to all problems; neither does it provide a universal way of circumvent¬ 
ing the slowing of Moore’s law. Strong limitations on the power of quantum computation are 
known; for many problems, it has been proven that quantum computation provides no significant 
advantage over classical computation. Grover’s algorithm, the other major algorithm of the mid- 
1990s, provides a small speedup for unstructured search algorithms. But it is also known that this 
small speedup is the most that quantum algorithms can attain. Grover’s search algorithm applies 
to unstructured search. For other search problems, such as searching an ordered list, quantum 
computation provides no significant advantage over classical computation. Simulation of quan¬ 
tum systems is the other significant application of quantum computation known in the mid-1990s. 
Of interest in its own right, the simulation of increasingly larger quantum systems may provide 
a bootstrap that will ultimately lead to the building of a scalable quantum computer. 

After Grover’s algorithm, there was a hiatus of more than five years before a significantly new 
algorithm was discovered. During that time, other areas of quantum information processing, such 
as quantum error correction, advanced significantly. In the early 2000s, several new algorithms 
were discovered. Like Shor’s algorithm, these algorithms solve specific problems with narrow, 
if important, applications. Novel approaches to constructing quantum algorithms also devel¬ 
oped. Investigations of quantum simulation from a quantum-information-processing point of view 
have led to improved classical techniques for simulating quantum systems, as well as novel quan¬ 
tum approaches. Similarly, the quantum-information-processing point of view has led to novel 
insights into classical computing, including new classical algorithms. Furthermore, alternatives to 
the standard circuit model of quantum computation have been developed that have led to new quan¬ 
tum algorithms, breakthroughs in building quantum computers, new approaches to robustness, 
and significant insights into the key elements of quantum computation. 

However long it takes to build a scalable quantum computer and whatever the breadth of 
applications turns out to be, quantum information processing has changed forever the way in which 
quantum physics is understood. The quantum information processing view of quantum mechanics 
has done much to clarify the character of key aspects of quantum mechanics such as quantum 
measurement and entanglement. This advancement in knowledge has already had applications 
outside of quantum information processing to the creation of highly entangled states used for 
microlithography at scales below the wavelength limit and for extraordinarily accurate sensors. 
The precise practical consequences of this increased understanding of nature are hard to predict, 
but the unification of the two theories that had the most profound influence on the technological 
advances of the twentieth century can hardly fail to have profound effects on technological and 
intellectual developments throughout the twenty-first. 

Part I of this book covers the basic building blocks of quantum information processing: quan¬ 
tum bits and quantum gates. Physical motivation for these building blocks is given and tied to the 
key quantum concepts of quantum measurement, quantum state transformations, and entangle¬ 
ment between quantum subsystems. Each of these concepts is explored in depth. Quantum key 
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distribution, quantum teleportation, and quantum dense coding are introduced along the way. The 
final chapter of part I shows that anything that can be done on a classical computer can be done 
with comparable efficiency on a quantum computer. 

Part II covers quantum algorithms. It begins with a description of some of the most common 
elements of quantum computation. Since the advantage of quantum computation over classical 
computation is all about efficiency, part II carefully defines notions of complexity. Part II also 
discusses known bounds on the power of quantum computation. A number of simple algorithms 
are described. Full chapters are devoted to Shor’s algorithm and Grover’s algorithm. 

Part III explores entanglement and robust quantum computation. A discussion of quantum 
subsystems leads into discussions of quantifying entanglement and of decoherence, the environ¬ 
mental errors affecting a quantum system because it is really a part of a larger quantum system. 
The elegant and important topic of quantum error correction fills a chapter, followed by a chapter 
on techniques to achieve fault tolerance. The book finishes with brief descriptions and pointers 
to references for many quantum information processing topics the book could not cover in depth. 
These include further quantum algorithms and protocols, adiabatic, cluster state, holonomic, and 
topological quantum computing, and the impact quantum information processing has had on 
classical computer science and physics. 




I 


QUANTUM BUILDING BLOCKS 


Quantum mechanics, that mysterious, confusing discipline, which none of us really understands, but which 
we know how to use. 

—Murray Gell-Mann [126] 




2 


Single-Qubit Quantum Systems 


Quantum bits are the fundamental units of information in quantum information processing in 
much the same way that bits are the fundamental units of information for classical processing. 
Just as there are many ways to realize classical bits physically (two voltage levels, lights on or off 
in an array, positions of toggle switches), there are many ways to realize quantum bits physically. 
As is done in classical computer science, we will concern ourselves only rarely with how the 
quantum bits are realized. For the sake of concretely illustrating quantum bits and their properties, 
however, section 2.1 looks at the behavior of polarized photons, one of many possible realizations 
of quantum bits. 

Section 2.2 abstracts key properties from the polarized photon example of section 2.1 to give 
a precise definition of a quantum bit, or qubit, and a description of the behavior of quantum bits 
under measurement. Dirac’s bra/ket notation, the standard notation used throughout quantum 
information processing as well as quantum mechanics, is introduced in this section. Section 2.4 
describes the first application of quantum information processing: quantum key distribution. The 
chapter concludes with a detailed discussion of the state space of a single-qubit system. 

2.1 The Quantum Mechanics of Photon Polarization 

A simple experiment illustrates some of the nonintuitive behavior of quantum systems, behavior 
that is exploited to good effect in quantum algorithms and protocols. This experiment can be 
performed by the reader using only minimal equipment: a laser pointer and three polaroids 
(polarization filters), readily available from any camera supply store. The formalisms of quantum 
mechanics that describe this simple experiment lead directly to a description of the quantum bit, 
the fundamental unit of quantum information on which quantum information processing is done. 
The experiment not only gives a concrete realization of a quantum bit, but it also illustrates key 
properties of quantum measurement. We encourage you to obtain the equipment and perform the 
experiment yourself. 
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2.1.1 A Simple Experiment 

Shine a beam of light on a projection screen. When polaroid A is placed between the light source 
and the screen, the intensity of the light reaching the screen is reduced. Let us suppose that the 
polarization of polaroid A is horizontal (figure 2.1). 

Next, place polaroid C between polaroid A and the projection screen. If polaroid C is rotated 
so that its polarization is orthogonal (vertical) to the polarization of A, no light reaches the screen 
(figure 2.2). 



Figure 2.1 

Single polaroid attenuates unpolarized light by 50 percent. 



Figure 2.2 

Two orthogonal polaroids block all photons. 
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Figure 2.3 

Inserting a third polaroid allows photons to pass. 


Finally, place polaroid B between polaroids A and C. One might expect that adding another 
polaroid will not make any difference; if no light got through two polaroids, then surely no light 
will pass through three! Surprisingly, at most polarization angles of B, light shines on the screen. 
The intensity of this light will be maximal if the polarization of B is at 45 degrees to both A and 
C (figure 2.3). 

Clearly the polaroids cannot be acting as simple sieves; otherwise, inserting polaroid B could 
not increase the number of photons reaching the screen. 

2.1.2 A Quantum Explanation 

For a bright beam of light, there is a classical explanation of the experiment in terms of waves. 
Versions of the experiment described here, using light so dim that only one photon at a time 
interacts with the polaroid, have been done with more sophisticated equipment. The results of 
these single photon experiments can be explained only using quantum mechanics; the classical 
wave explanation no longer works. Furthermore, it is not just light that behaves in this peculiar 
way. The quantum mechanical explanation of the experiment consists of two parts: a model of a 
photon’s polarization state and a model of the interaction between a polaroid and a photon. The 
description of this experiment, and the definition of a qubit, use basic notions of linear algebra 
such as vector, basis, orthonormal, and linear combination. Linear algebra is used throughout 
the book; we briefly remind readers of the meanings of these concepts in section 2.2. Section 2.6 
suggests some books on linear algebra. 

Quantum mechanics models a photon’s polarization state by a unit vector, a vector of length 
1, pointing in the appropriate direction. We write |f) and |->) for the unit vectors that represent 
vertical and horizontal polarization respectively. Think of |i>) as a vector with some arbitrary 
label v. In quantum mechanics, the standard notation for a vector representing a quantum state 
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v)=a|T> + i|-> 


Figure 2.4 

Measurement of state |i>) — u| j) + b\—*) by a measuring device with preferred basis |—>■)}. 


is |t>), just as v or v are notations used for vectors in other settings. This notation is part of a 
more general notation, Dirac’s notation, that will be explained in more detail in sections 2.2 and 
4.1. An arbitrary polarization can be expressed as a linear combination |u> = a|f) + Z?|—>■} of the 
two basis vectors | f} and | —>). For example, \/ [ ) — -J= | f) + -^= | —>-) is a unit vector representing 
polarization of 45 degrees. The coefficients a and/? in | v) = a|f) + b\—>) are called the amplitudes 
of |u) in the directions |f) and |->) respectively (see figure 2.4). When a and b are both non-zero, 
|i>) = a\ f) + b\—y) is said to be a superposition of |f) and |—»•). 

Quantum mechanics models the interaction between a photon and a polaroid as follows. The 
Polaroid has a preferred axis, its polarization. When a photon with polarization |u) = a|f) + 
b\—>) meets a polaroid with preferred axis |f), the photon will get through with probability 
\a\ 2 and will be absorbed with probability |Z?| 2 ; the probability that a photon passes through the 
polaroid is the square of the magnitude of the amplitude of its polarization in the direction of the 
Polaroid’s preferred axis. The probability that the photon is absorbed by the polaroid is the square 
of the magnitude of the amplitude in the direction perpendicular to the Polaroid’s preferred axis. 
Furthermore, any photon that passes through the polaroid will now be polarized in the direction of 
the Polaroid’s preferred axis. The probabilistic nature of the interaction and the resulting change 
of state are features of all interactions between qubits and measuring devices, no matter what their 
physical realization. 

In the experiment, any photons that pass through polaroid A will leave polarized in the direction 
of polaroid A’s preferred axis, in this case horizontal, |->). A horizontally polarized photon has 
no amplitude in the vertical direction, so it has no chance of passing through polaroid C, which 
was given a vertical orientation. For this reason, no light reaches the screen. Had polaroid C 
been in any other orientation, a horizontally polarized photon would have some amplitude in the 
direction of polaroid C’s preferred axis, and some photons would reach the screen. 

To understand what happens once polaroid B, with preferred axis \f), is inserted, it is helpful 
to write the horizontally polarized photon’s polarization state |—»•) as 
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Any photon that passes through polaroid A becomes horizontally polarized, so the amplitude 
of any such photon’s state |—>•) in the direction \/') is Applying the quantum theory we just 
learned tells us that a horizontally polarized photon will pass through polaroid B with probability 
\ = | | . Any photons that have passed through polaroid B now have polarization \/'). When 

these photons hit polaroid C, they do have amplitude in the vertical direction, so some of them 
(half) will pass thorough polaroid C and hit the screen (see figure 2.3). In this way, quantum 
mechanics explains how more light can reach the screen when the third polaroid is added, and it 
provides a means to compute how much light will reach the screen. 

In summary, the polarization state of a photon is modeled as a unit vector. Its interaction with a 
polaroid is probabilistic and depends on the amplitude of the photon’s polarization in the direction 
of the Polaroid’s preferred axis. Either the photon will be absorbed or the photon will leave the 
polaroid with its polarization aligned with the Polaroid’s preferred axis. 

2.2 Single Quantum Bits 

The space of possible polarization states of a photon is an example of a quantum bit, or qubit. A 
qubit has a continuum of possible values: any state represented by a unit vector a|t) + b\—>) is 
a legitimate qubit value. The amplitudes a and b can be complex numbers, even though complex 
amplitudes were not needed for the explanation of the experiment. (In the photon polarization 
case, the imaginary coefficients correspond to circular polarization.) 

In general, the set of all possible states of a physical system is called the state space of the 
system. Any quantum mechanical system that can be modeled by a two-dimensional complex 
vector space can be viewed as a qubit. (There is redundancy in this representation in that any 
vector multiplied by a modulus one [unit length] complex number represents the same quantum 
state. We discuss this redundancy carefully in sections 2.5 and 3.1.) Such systems, called two- 
state quantum systems, include photon polarization, electron spin, and the ground state together 
with an excited state of an atom. The two-state label for these systems does not mean that the 
state space has only two states—it has infinitely many—but rather that all possible states can be 
represented as a linear combination, or superposition, of just two states. For a two-dimensional 
complex vector space to be viewed as a qubit, two linearly independent states, labeled |0) and 11), 
must be distinguished. For the theory of quantum information processing, all two-state systems, 
whether they be electron spin or energy levels of an atom, are equally good. From a practical 
point of view, it is as yet unclear which two-state systems will be most suitable for physical 
realizations of quantum information processing devices such as quantum computers; it is likely 
that a variety of physical representation of qubits will be used. 

Dirac’s bra/ket notation is used throughout quantum physics to represent quantum states and 
their transformations. In this section we introduce the part of Dirac’s notation that is used for 
quantum states. Section 4.1 introduces Dirac’s notation for quantum transformations. Familiarity 
and fluency with this notation will help greatly in understanding all subsequent material; we 
strongly encourage readers to work the exercises at the end of this chapter. 



14 


2 Single-Qubit Quantum Systems 


In Dirac’s notation, a ket such as |jc), where x is an arbitrary label, refers to a vector representing 
a state of a quantum system. A vector |u) is a linear combination of vectors |si), Isi},..., |s„) if 
there exist complex numbers a; such that |u) = a\ |,s’i) + a 2 \s 2 ) H— • + a„\s n ). 

A set of vectors S generates a complex vector space V if every element |i>) of V can be 
written as a complex linear combination of vectors in the set: every | v) e V can be written as 
| v) — a 1 1Si> + a 2 1 s 2 } + • —|- a n \s n ) for some elements | Sj) e S and complex numbers a,. Given a 
set of vectors S , the subspace of all linear combinations of vectors in S is called the span of S and 
is denoted span (A). A set of vectors B for which every element of V can be written uniquely as a 
linear combination of vectors in B is called a basis for V. In a two-dimensional vector space, any 
two vectors that are not multiples of each other form a basis. In quantum mechanics, bases are 
usually required to be orthonormal , a property we explain shortly. The two distinguished states, 
|0) and 11), are also required to be orthonormal. 

An inner product (i^lft), or dot product, on a complex vector space V is a complex function 
defined on pairs of vectors |iq) and 1 v 2 ) in V, satisfying 

• (u|i>) is non-negative real, 

• {v 2 \v{) = {v\\v 2 ), and 

• (a<u 2 | + £»<U3l)|wi> = a(v 2 \v\) + &(u 3 |ih), 


where I is the complex conjugate z = a — ib of z — a + ib. 

Two vectors | iq) and | v 2 ) are said to be orthogonal if (iq | v 2 ) — 0. A set of vectors is orthogonal 
if all of its members are orthogonal to each other. The length, or norm, of a vector |u) is | |u) | = 
Since all vectors |x) representing quantum states are of unit length, {x\x) — 1 for any 
state vector \x). A set of vectors is said to be orthonormal if all of its elements are of length 
one and orthogonal to each other: a set of vectors B — {\P\), \fl 2 ), ..., \/3„)} is orthonormal if 
(Pi\Pj) = S t j for all i, j, where 


S 


U 


1 if i — j 
0 otherwise. 


In quantum mechanics we are mainly concerned with bases that are orthonormal, so whenever 
we say basis we mean orthonormal basis unless we say otherwise. 

For the state space of a two-state system to represent a quantum bit, two orthonormal distin¬ 
guished states, labeled |0) and 11), must be specified. Apart from the requirement that |0) and 11) 
be orthonormal, the states may be chosen arbitrarily. For instance, in the case of photon polariza¬ 
tion, we may choose |0) and 11) to correspond to the states | and |—»•), or to \/') and |\). We 
follow the convention that 10) = | f} and 11) = | —>), which implies that | ^') = 4^ (10> + 11)) and 
|\) = -4(|0> — 11}). In the case of electron spin, |0> and 11} could correspond to the spin-up and 
spin-down states, or spin-left and spin-right. When talking about qubits, and quantum information 
processing in general, a standard basis {|0>, 11)} with respect to which all statements are made 
must be chosen in advance and remain fixed throughout the discussion. In quantum information 
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processing, classical bit values of 0 and 1 will be encoded in the distinguished states |0) and 11). 
This encoding enables a direct comparison between bits and qubits: bits can take on only two 
values, 0 and 1, while qubits can take on not only the values |0> and 11) but also any superposition 
of these values, a|0) +Z?|1), where a and b are complex numbers such that |a| 2 + |fo| 2 = 1. 
Vectors and linear transformations can be written using matrix notation once a basis has been 

specified. That is, if basis {|/Si>, l^)} is specified, a ket |u} = a\P\) + &I/62} can be written 

a ket v > corresponds to a column vector u, where v is simply a label, a name for this vector. The 
conjugate transpose v f of a vector 


v = 


/ oi \ 


V / 


is = (fli,..., a„ ) 


In Dirac’s notation, the conjugate transpose of a ket i>) is called a bra and is written (u|, so 

/ \ 


|u> 


and (u| = (ai,..., a n ) . 


V / 

A bra (v| corresponds to a row vector v f . 
Given two complex vectors 



/ a l \ 


( bl \ 

a) = 


and | b) = 



\ / 


V b n ) 


the standard inner product (a\b) is defined to be the scalar obtained by multiplying the conjugate 
transpose ( a\ = (ai,... ,aH) with | b): 


{a\b) = {a\\b) = {cTu .. 




( h \ 


V K / 


n 

J2«bi. 

i =1 


When a = \a) and b = \b) are real vectors, this inner product is the same as the standard dot 
product on the n dimensional real vector space R": (a\b) = a\b\ + ■ — \-a n b n = a ■ b. Dirac’s 
choice of bra and ket arose as a play on words: an inner product (a\b) of a bra (a| and a ket | b) 
is sometimes called a bracket. The following relations hold, where v = a|0> + fo|l>: (0|0> = 1, 
(1|1> = 1, (1|0) = (0|1) = 0, (0|u) = a, and (l|u> = b. 

In the standard basis, with ordering {|0>, 11}}, the basis elements |0) and 11} can be expressed 

as and ^^ j, and a complex linear combination |t>} =a|0) + i»|l> can be written 
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This choice of basis and order of the basis vectors are mere convention. Representing |0) as 

^^ j and |1> as or representing |0> as and |1> as ^ | j would be equally 

good as long as it is done consistently. Unless otherwise specified, all vectors and matrices in this 
book will be written with respect to the standard basis {10}, 11)} in this order. 

A quantum state |i>) is a superposition of basis elements {|ySi), \Pi)} if it is a nontrivial linear 
combination of |/3i) and I/J2}, if |i>) = a\\Pi) +a 2 \Pi) where ci\ and ai are non-zero. For the 
term superposition to be meaningful, a basis must be specified. In this book, if we say “super¬ 
postion” without explicitly specifying the basis, we implicitly mean with respect to the standard 
basis. 

Initially the vector/matrix notation will be easier for many readers to use because it is familiar. 
Sometimes matrix notation is convenient for performing calculations, but it always requires the 
choice of a basis and an ordering of that basis. The bra/ket notation has the advantage of being 
independent of basis and the order of the basis elements. It is also more compact and suggests 
correct relationships, as we saw for the inner product, so once it becomes familiar, it is easier to 
read and faster to use. 

Instead of qubits, physical systems with states modeled by three- or n -dimensional vector 
spaces could be used as fundamental units of computation. Three-valued units are called qutrits, 
and n -valued units are called qudits. Since qudits can be modeled using multiple qubits, a model 
of quantum information based on qudits has the same computational power as one based on qubits. 
For this reason we do not consider qudits further, just as in the classical case most people use a 
bit-based model of information. 

We now have a mathematical model with which to describe quantum bits. In addition, we need 
a mathematical model for measuring devices and their interaction with quantum bits. 

2.3 Single-Qubit Measurement 

The interaction of a polaroid with a photon illustrates key properties of any interaction between 
a measuring device and a quantum system. The mathematical description of the experiment can 
be used to model all measurements of single qubits, whatever their physical instantiation. The 
measurement of more complicated systems retains many of the features of single-qubit measure¬ 
ment: the probabilistic outcomes and the effect measurement has on the state of the system. This 
section considers only measurements of single-qubit systems. Chapter 4 discusses measurements 
of more general quantum systems. 

Quantum theory postulates that any device that measures a two-state quantum system must have 
two preferred states whose representative vectors, {|n}, |m- 1 >}, form an orthonormal basis for the 
associated vector space. Measurement of a state transforms the state into one of the measuring 
device’s associated basis vectors | u) or \u ± ). The probability that the state is measured as basis 
vector | u) is the square of the magnitude of the amplitude of the component of the state in the 
direction of the basis vector |n>. For example, given a device for measuring the polarization of 
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photons with associated basis {|w}, |m x }}, the state |n) = a\u) + b\u ± ) is measured as | u) with 
probability \a\ 2 and as |jr 1 } with probability |Z?| 2 . 

This behavior of measurement is an axiom of quantum mechanics. It is not derivable from 
other physical principles; rather, it is derived from the empirical observation of experiments with 
measuring devices. If quantum mechanics is correct, all devices that measure single qubits must 
behave in this way; all have associated bases, and the measurement outcome is always one of the 
two basis vectors. For this reason, whenever anyone says “measure a qubit," they must specify 
with respect to which basis the measurement takes place. Throughout the book, if we say “measure 
a qubit" without further elaboration, we mean that the measurement is with respect to the standard 
basis {|0>, 11)}. 

Measurement of a quantum state changes the state. If a state |u) = a\u) + b lir 1 } is measured 
as | u), then the state |u) changes to \u). A second measurement with respect to the same basis will 
return | u) with probability 1. Thus, unless the original state happens to be one of the basis states, 
a single measurement will change that state, making it impossible to determine the original state 
from any sequence of measurements. 

While the mathematics of measuring a qubit in the superposition state a | 0} + b | 1) with respect to 
the standard basis is clear, measurement brings up questions as to the meaning of a superposition. 
To begin with, the notion of superposition is basis-dependent; all states are superpositions with 
respect to some bases and not with respect to others. For instance, a|0) + b\ 1) is a superposition 
with respect to the basis {10), 11)} but not with respect to {«10> + b\ 1>, £>|0) — a\ 1}}. 

Also, because the result of measuring a superposition is probabilistic, some people are tempted 
to think of the state |u) = a|0) + b\ 1) as a probabilistic mixture of |0) and 11). It is not. In particular, 
it is not true that the state is really either |0) or 11) and that we just do not happen to know which. 
Rather, | v) is a definite state, which, when measured in certain bases, gives deterministic results, 
while in others it gives random results: a photon with polarization \/') — -^(|f) + |—»•)) behaves 
deterministically when measured with respect to the Hadamard basis {\/*), |\}}, but it gives 
random results when measured with respect to the standard basis {|f), |—>}}. It is okay to think 
of a superposition |i>) = a|0) + b 11) as in some sense being in both state |0> and state 11) at the 
same time, as long as that statement is not taken too literally: states that are combinations of |0) 
and 11 > in similar proportions but with different amplitudes, such as (10) + 11)), -4 (10) — 11}) 
and -L(|0) + i| 1>), represent distinct states that behave differently in many situations. 

Given that qubits can take on any one of infinitely many states, one might hope that a single 
qubit could store lots of classical information. However, the properties of quantum measurement 
severely restrict the amount of information that can be extracted from a qubit. Information about 
a quantum bit can be obtained only by measurement, and any measurement results in one of only 
two states, the two basis states associated with the measuring device; thus, a single measurement 
yields at most a single classical bit of information. Because measurement changes the state, one 
cannot make two measurements on the original state of a qubit. Furthermore, section 5.1.1 shows 
that an unknown quantum state cannot be cloned, which means it is not possible to measure a 
qubit’s state in two ways, even indirectly by copying the qubit’s state and measuring the copy. 
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Thus, even though a quantum bit can be in infinitely many different superposition states, it is 
possible to extract only a single classical bit’s worth of information from a single quantum bit. 

2.4 A Quantum Key Distribution Protocol 

The quantum theory introduced so far is sufficient to describe a first application of quantum 
information processing: a key distribution protocol that relies on quantum effects for its security 
and for which there is no classical analog. 

Keys—binary strings or numbers chosen randomly from a sufficiently large set—provide the 
security for most cryptographic protocols, from encryption to authentication to secret sharing. 
For this reason, the establishment of keys between the parties who wish to communicate is of 
fundamental importance in cryptography. Two general classes of keys exist: symmetric keys and 
public-private key pairs. Both types are used widely, often in conjunction, in a wide variety of 
practical settings, from secure e-commerce transactions to private communication over public 
networks. 

Public-private key pairs consist of a public key, knowable by all, and a corresponding private 
key whose secrecy must be carefully guarded by the owner. Symmetric keys consist of a single 
key (or a pair of keys easily computable from one another) that are known to all of the legitimate 
parties and no one else. In the symmetric key case, multiple parties are responsible for guarding 
the security of the key. 

Quantum key distribution protocols establish a symmetric key between two parties, who are 
generally known in the cryptographic community as Alice and Bob. Quantum key distribution 
protocols can be used securely anywhere classical key agreement protocols such as Diffie-Hellman 
can be used. They perform the same task; however, the security of quantum key distribution rests 
on fundamental properties of quantum mechanics, whereas classical key agreement protocols 
rely on the computational intractability of a certain problem. For example, while Diffie-Hellman 
remains secure against all known classical attacks, the problem on which it is based, the discrete 
logarithm problem, is tractable on a quantum computer. Section 8.6.1 discusses Shor’s quantum 
algorithm for the discrete log problem. 

The earliest quantum key distribution protocol is known as BB84 after its inventors, Charles 
Bennett and Gilles Brassard, and the year of the invention. The aim of the BB84 protocol is 
to establish a secret key, a random sequence of bit values 0 and 1, known only to the two 
parties, Alice and Bob, who may use this key to support a cryptographic task such as exchanging 
secret messages or detecting tampering. The BB84 protocol enables Alice and Bob to be sure 
that if they detect no problems while attempting to establish a key, then with high probability 
it is secret. The protocol does not guarantee, however, that they will succeed in establishing a 
private key. 

Suppose Alice and Bob are connected by two public channels: an ordinary bidirectional classical 
channel and a unidirectional quantum channel. The quantum channel allows Alice to send a 
sequence of single qubits to Bob; in our case we suppose the qubits are encoded in the polarization 
states of individual photons. Both channels can be observed by an eavesdropper Eve, This situation 
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Figure 2.5 

Alice and Bob wish to agree on a common key not known to Eve. 


is illustrated in figure 2.5. To begin the process of establishing a private key, Alice uses quantum 
or classical means to generate a random sequence of classical bit values. As we will see, a random 
subset of this sequence will be the final private key. Alice then randomly encodes each bit of 
this sequence in the polarization state of a photon by randomly choosing for each bit one of the 
following two agreed-upon bases in which to encode it: the standard basis, 

0 -> If} 

i -» l-O, 

or the Hadamard basis, 

0 l/ , ) = ^(lt) + K» 

1 l\} = ^(lt)-K»- 

She sends this sequence of photons to Bob through the quantum channel. 

Bob measures the state of each photon he receives by randomly picking either basis. Over 
the classical channel, Alice and Bob check that Bob has received a photon for every one Alice 
has sent, and only then do Alice and Bob tell each other the bases they used for encoding and 
decoding (measuring) each bit. When the choice of bases agree, Bob’s measured bit value agrees 
with the bit value that Alice sent. When they chose different bases, the chance that Bob’s bit 
matches Alice’s is only 50 percent. Without revealing the bit values themselves, which would 
also reveal the values to Eve, there is no way for Alice and Bob to figure out which of these bit 
values agree and which do not. So they simply discard all the bits on which their choice of bases 
differed. An average of 50 percent of all bits transmitted remain. Then, depending on the level of 
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assurance they require, Alice and Bob compare a certain number of bit values to check that no 
eavesdropping has occurred. These bits will also be discarded, and only the remaining bits will 
be used as their private key. 

We describe one sort of attack that Eve can make and how quantum aspects of this protocol 
guard against it. On the classical channel, Alice and Bob discuss only the choice of bases and 
not the bit values themselves, so Eve cannot gain any information about the key from listening to 
the classical channel alone. To gain information, Eve must intercept the photons transmitted by 
Alice through the quantum channel. Eve must send photons to Bob before knowing the choice of 
bases made by Alice and Bob, because they compare bases only after Bob has confirmed receipt 
of the photons. If she sends different photons to Bob, Alice and Bob will detect that something is 
wrong when they compare bit values, but if she sends the original photons to Bob without doing 
anything, she gains no information. 

To gain information, Eve makes a measurement before sending the photons to Bob. Instead of 
using a polaroid to measure, she can use a calcite crystal and a photon detector; a beam of light 
passing through a calcite crystal is split into two spatially separated beams, one polarized in the 
direction of the crystal’s optic axis and the other polarized in the direction perpendicular to the 
optic axis. A photon detector placed in one of the beams performs a quantum measurement: 
the probability with which a photon ends up in one of the beams can be calculated just as described 
in section 2.3. 

Since Alice has not yet told Bob her sequence of bases, Eve does not know in which basis 
to measure each bit. If she randomly measures the bits, she will measure using the wrong basis 
approximately half of the time. (Exercise 2.10 examines the case in which Eve does not even know 
which two bases to choose from.) When she uses the wrong basis to measure, the measurement 
changes the polarization of the photon before it is resent to Bob. This change in the polarization 
means that, even if Bob measures the photon in the same basis as Alice used to encode the bit, he 
will get the correct bit value only half the time. 

Overall, for each of the qubits Alice and Bob retain, if the qubit was measured by Eve before 
she sent it to Bob, there will be a 25 percent chance that Bob measures a different bit value than 
the one Alice sent. Thus, this attack on the quantum channel is bound to introduce a high error 
rate that Alice and Bob detect by comparing a sufficient number of bits over the classical channel. 
If these bits agree, they can confidently use the remaining bits as their private key. So, not only 
is it likely that 25 percent of Eve’s version of the key is incorrect, but the fact that someone is 
eavesdropping can be detected by Alice and Bob. Thus Alice and Bob run little risk of establishing 
a compromised key; either they succeed in creating a private key or they detect that eavesdropping 
has taken place. 

Eve does not know in which basis to measure the qubits, a property crucial to the security of this 
protocol, because Alice and Bob share information about which bases they used only after Bob has 
received the photons; if Eve knew in which basis to measure the photons, her measurements would 
not change the state, and she could obtain the bit values without Bob and Alice noticing anything 
suspicious. A seemingly easy way for Eve to overcome this obstacle is for her to copy the qubit, 
keeping a copy for herself while sending the original on to Bob. Then she can measure her copy 
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later after learning the correct basis from listening in on the classical channel. Such a protocol is 
defeated by an important property of quantum information. As we will show in section 5.1.1, the 
no-cloning principle of quantum mechanics means that it is impossible to reliably copy quantum 
information unless a basis in which it is encoded is known; all quantum copying machines are 
basis dependent. Copying with the wrong machine not only does not produce an accurate copy, 
but it also changes the original in much the same way measuring in the wrong basis does. So Bob 
and Alice would detect attempts to copy with high probability. 

The security of this protocol, like other pure key distribution protocols such as Diffie-Hellman, 
is vulnerable to a man-in-the-middle attack in which Eve impersonates Bob to Alice and imper¬ 
sonates Alice to Bob. To guard against such an attack, Alice and Bob need to combine it with an 
authentication protocol, be it recognizing each other’s voices or a more mathematical authenti¬ 
cation protocol. 

More sophisticated versions of this protocol exist that support quantum key distribution through 
noisy channels and stronger guarantees about the amount of information Eve can gain. In the noisy 
case, Eve is able to gain some information initially, but techniques of quantum error correction 
and privacy amplification can reduce the amount of information Eve gains to arbitrarily low levels 
as well as compensate for the noise in the channels. 

2.5 The State Space of a Single-Qubit System 

The state space of a classical or quantum physical system is the set of all possible states of the sys¬ 
tem. Depending on which properties of the system are under consideration, a state of the system 
consists of any combination of the positions, momenta, polarizations, spins, energy, and so on of 
the particles in the system. When we are considering only polarization states of a single photon, 
the state space is all possible polarizations. More generally, the state space for a single qubit, no 
matter how it is realized, is the set of possible qubit values, 

{a|0>+fc|l>}, 

where | a | 2 + | b \ 2 = 1 and a | 0) + b | 1} and a' | 0} + b' | 1 > are considered the same qubit value if 
a|0> + b\ 1) = c(a'|0> + b'\ 1)) for some modulus one complex number c. 

2.5.1 Relative Phases versus Global Phases 

That the same quantum state is represented by more than one vector means that there is a critical 
distinction between the complex vector space in which we write our qubit values and the quantum 
state space itself. We have reduced the ambiguity by requiring that vectors representing quantum 
states be unit vectors, but some ambiguity remains: unit vectors equivalent up to multiplication by 
a complex number of modulus one represent the same state. The multiple by which two vectors 
representing the same quantum state differ is called the global phase and has no physical mean¬ 
ing. We use the equivalence relation |i>) ~ |i/) to indicate that |u> = c\v') for some complex 
global phase c — e lct> . The space in which two two-dimensional complex vectors are considered 
equivalent if they are multiples of each other is called complex projective space of dimension one. 
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This quotient space, a space obtained by identifying sets of equivalent vectors with a single point 
in the space, is expressed with the compact notation used for quotient spaces: 

CP 1 = {fl|0} + fi|l}}/ ~ . 

So the quantum state space for a single-qubit system is in one-to-one correspondence with the 
points of the complex projective space CP 1 . We will make no further use of CP 1 in this book, 
but it is used in the quantum information processing literature. 

Because the linearity of vector spaces makes them easier to work with than projective spaces 
(we know how to add vectors and there is no corresponding way of adding points in projec¬ 
tive spaces), we generally perform all calculations in the vector space corresponding to the 
quantum state space. The multiplicity of representations of a single quantum state in this vec¬ 
tor space representation, however, is a common source of confusion for newcomers to the 
field. 

A physically important quantity is the relative phase of a single-qubit state a|0) +b\\). The 
relative phase (in the standard basis) of a superposition «10) + b\ 1) is a measure of the angle in 
the complex plane between the two complex numbers a and b. More precisely, the relative phase 
is the modulus one complex number e satisfying a/b — e*^|a|/|fc|. Two superpositions a|0) + 
fi|l) and a'|0) +fi'|l) whose amplitudes have the same magnitudes but that differ in a relative 
phase represent different states. 

The physically meaningful relative phase and the physically meaningless global phase should 
not be confused. While multiplication with a unit constant does not change a quantum state vector, 
relative phases in a superposition do represent distinct quantum states: even though | iq) ~ e'^ | tq), 
the vectors 1 (e'^jtq) + |iq}) and 4^(|tq) + |tq}) do not represent the same state. We must always 
be cognizant of the ~ equivalence when we interpret the results of our computations as quantum 
states. 

A few single-qubit states will be referred to often enough that we give them special labels: 


|+> =1/V2(|0) + |1» 

(2.1) 

|-> = 1/V2(|0>-|1» 

(2.2) 

|i> =l/V2(|0>+i|l» 

(2.3) 

l-i) = 1/V2(|0>—i|l». 

(2.4) 


The basis {|+}, |—)} is referred to as the Hadamard basis. We sometimes use the notation (|\), 
\/) \ for the Hadamard basis when discussing photon polarization. 

Some authors omit normalization factors, allowing vectors of any length to represent a state 
where two vectors represent the same state if they differ by any complex factor. We will explicitly 
write the normalizations factors, both because then the amplitudes have a more direct relation to 
the measurement probabilities and because keeping track of the normalization factor provides a 
check that helps avoid errors. 
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2.5.2 Geometric Views of the State Space of a Single Qubit 

While we primarily use vectors to represent quantum states, it is helpful to have models of the 
single-qubit state space in which there is a one-to-one correspondence between states and points 
in the space. We give two related but different geometric models with this property. The second 
of these, the Bloch sphere model, will be used in section 5.4.1 to illustrate single-qubit quantum 
transformations, and in chapter 10 it will be generalized to aid in the discussion of single-qubit 
subsystems. These models are just different ways of looking at complex projective space of 
dimension 1. As we will see, complex projective space of dimension 1 can be viewed as a sphere. 
First we show that it can be viewed as the extended complex plane, the complex plane C together 
with an additional point traditionally labeled oo. 

Extended Complex Plane C U {oo) A correspondence between the set of all complex numbers and 
single-qubit states is given by 

a|0> + b\ 1) i-^ b/a — a 

and its inverse 



The preceding mapping is not defined for the state with a = 0 and b = 1. To make this corre¬ 
spondence one-to-one we need to add a single point, which we label oo, to the complex plane and 
define oo ** 11). For example, we have 

|0> i-> 0 

11) M* OO 

l+> +1 

I-) -1 

|i> >-> i 

I—i> ^ -i. 

We now describe another useful model, related to but different from the previous one. 

Bloch Sphere Starting with the previous representation, we can map each state, represented by the 
complex number a — s + it, onto the unit sphere in three real dimensions, the points (x. y, z) e C 
satisfying \x\ 2 + | y| 2 + |z| 2 = 1, via the standard stereographic projection 

/ 2s 21 1 — | a | ” \ 

\|a|“+l |a|” + 1 |a|“ + 1 / 


0, t) !->■ 
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II) 



0 


Figure 2.6 

Location of certain single-qubit states on the surface of the Bloch sphere. 


further requiring that oo i-> (0, 0, —1). Figure 2.6 illustrates the following correspondences: 

|0} h* (0,0,1) 

|1) b+(0,0,-l) 

|+> h* (1,0,0) 

|-> (- 1 , 0 , 0 ) 

|i> ^ (0,1,0) 

I —i> ^ (0,-1,0). 

We have given three representations of the quantum state space for a single-qubit system. 

1. Vectors written in ket notation: a\0)+b\\) with complex coefficients a and b, subject to 
\a\~ +\bY — I, where a and b are unique up to a unit complex factor. Because of this factor, the 
global phase, this representation is not one-to-one, 

2. Extended complex plane: a single complex number a e C or oo. This representation is one- 
to-one. 

3. Bloch sphere: points ( x , y, z) on the unit sphere. This representation is also one-to-one. 

As we will see in section 10.1, the points in the interior of the sphere also have meaning for 
quantum information processing. For historical reasons, the entire ball, including the interior, is 
called the Bloch sphere, instead of just the states on the surface, which truly form a sphere. For 
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this reason, we refer to the state space of a single-qubit system as the surface of the Bloch sphere 
(figure 2.6). 

One of the advantages of the Bloch sphere representation is that it is easy to read off all possible 
bases from the model; orthogonal states correspond to antipodal points of the Bloch sphere. In 
particular, every diameter of the Bloch sphere corresponds to a basis for the single-qubit state 
space. 

The illustration we gave in figure 2.4 differs from the Bloch sphere representation of single¬ 
qubit quantum states in that the angles are half that of those in the Bloch sphere representation: 
in particular, the angle between two states in figure 2.4 has the usual relation to the inner product, 
whereas in the Bloch sphere representation the angle is twice that of the angle in the inner product 
formula. 

2.5.3 Comments on General Quantum State Spaces 

The states of all quantum systems satisfy certain properties that are encapsulated by a linear differ¬ 
ential equation called the Schrodinger wave equation. For this reason, solutions to the Schrodinger 
equation are called wave functions, so all quantum states have representations as wave functions. 
For the theory of quantum information processing, we do not need to concern ourselves with prop¬ 
erties specific to any of the various possible physical realizations of quantum bits, so we do not 
need to look at the details of specific wave function solutions; we can simply view wave functions 
as abstract vectors which we will denote by kets such as |-») or |0). 

Since the Schrodinger equation is linear, the addition of two solutions to the Schrodinger equa¬ 
tion or a constant multiple of a solution of the Schrodinger equation are also solutions to the 
Schrodinger equation. Thus, the set of solutions to the Schrodinger equation for any quantum 
system is a complex vector space. Furthermore, the set of solutions has a natural inner product. 
For the theoretical aspects of quantum information processing, considering only finite dimen¬ 
sional vector spaces usually suffices. We simply mention that, in the infinite dimensional case, 
the space of solutions satisfies the conditions needed to form a Hilbert space. Hilbert spaces 
are frequently mentioned in the literature, since they are the most general case, but in most 
papers on quantum information processing, the Hilbert spaces discussed are finite-dimensional, 
in which case they are nothing more or less than finite-dimensional complex vector spaces. 
We discuss the state spaces of multiple-qubit systems in chapter 3. Just as in the single-qubit 
case, there is redundancy in this model. In fact, there is greater redundancy in the vector 
space representation of larger quantum systems, which leads to a significantly more complicated 
geometry. 

2.6 References 

The early essays of Feynman and Manin can be found in [119, 120, 121] and [202, 203] respec¬ 
tively. The bra/ket notation was first introduced by Dirac in 1958 [103]. It is found in most 
quantum mechanics textbooks and is used in virtually all papers on quantum computing. 
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More information about linear algebra, in particular proofs of facts stated here, can be found in 
any linear algebra text, including Strang’s Linear Algebra and Its Applications [265] and Hoffman 
and Kunze’s Linear Algebra [152], or in a book on mathematics for physicists such as Bamberg 
and Sternberg’s A Course in Mathematics for Students of Physics [30], 

The BB84 quantum key distribution protocol was developed by Charles Bennett and Gilles 
Brassard [42, 43, 45] building on work of Stephen Wiesner [284]. A related protocol was shown 
to be unconditionally secure by Lo and Chau [198], Their proof was later simplified by Shor 
and Preskill [255] and extended to BB84. Another proof was given by Mayers [206]. The BB84 
protocol was first demonstrated experimentally by Bennett et al. in 1992 over 30 cm of free space 
[37], Since then, several groups have demonstrated this protocol and other quantum key distri¬ 
bution protocols over 100 km of fiber optic cable. Bienfang et al. [51] demonstrated quantum 
key distribution over 23 km of free space at night, and Hughes et al. have achieved distances of 
10 km through free space in daylight [156]. See the ARDA roadmap [157], the QIPC strategic 
report [295], and Gisin et al. [130] for detailed overviews of implementation efforts and the 
challenges involved. The companies id Quantique, MagiQ, and SmartQuantum currently sell 
quantum cryptographic systems implementing the BB84 protocol. Other quantum key distribu¬ 
tion protocols exist. Exercise 2.11 develops the B92 protocol, and section 3.4 describes Ekert’s 
entanglement-based quantum key distribution protocol. 

While we explain all quantum mechanics needed for the topics covered in this book, the reader 
may be interested in books on quantum mechanics. Countless books on quantum mechanics are 
available. Greenstein and Zajonc [140] give a readable high-level exposition of quantum mechan¬ 
ics, including descriptions of many experiments. The third volume of the Feynman Lectures on 
Physics [122] is accessible to a large audience. A classical explanation of the polarization exper¬ 
iment is given in the first volume. Shankar’s textbook [247] defines much more of the notation 
and mathematics required for performing calculations than do the previously mentioned books, 
and it is quite readable as well. Other textbooks, such as Liboff [194], may be more appropriate 
for readers with a physics background. 

2.7 Exercises 

Exercise 2.1. Let the direction |u> of polaroid IPs preferred axis be given as a function of 6, 
|i>) = cos0|—>} + sin0|f), and suppose that the polaroids A and C remain horizontally and 
vertically polarized as in the experiment of Section 2.1.1. What fraction of photons reach the 
screen? Assume that each photon generated by the laser pointer has random polarization. 

Exercise 2.2. Which pairs of expressions for quantum states represent the same state? For those 
pairs that represent different states, describe a measurement for which the probabilities of the two 
outcomes differ for the two states and give these probabilities. 

a. |0> and —10> 

b. 11) and i| 1) 
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c. ^(|0> + |l))and^(-|0>+i|l» 

d. -^(|0> + |l))and^(|0>-|l» 

e. ^(|0>-|l))and^(|l>-|0» 

f. -^(|0>+i|l»and^(i|l>-|0» 
g- ^(l+> + |-»and|0> 

h. (|i) - |-i)) and |1> 

i. ^(|i> + |-i»and^(|-) + |+» 

j. ^ (|0> + e ilr/4 |D) and -L ( e -“/ 4 |0) + | 1} ) 

Exercise 2.3. Which states are superpositions with respect to the standard basis, and which are 
not? For each state that is a superposition, give a basis with respect to which it is not a superposition. 

a- l+> 

b- £(l+> + |-» 
c- ^(l+>-|-» 
d. f |+)-i|-» 
e- ^(Ii>-|-i» 
f. ^(10)-ID) 

Exercise 2.4. Which of the states in 2.3 are superpositions with respect to the Hadamard basis, 
and which are not? 

Exercise 2.5. Give the set of all values of 6 for which the following pairs of states are equivalent. 

a. ID and (|+> + e w |—» 

b. -L (|i) + e w |—i» and (|-i) -F|i)) 

c. i| 0 )-f |l>and e ie (i|0>-f |1>) 

Exercise 2.6. For each pair consisting of a state and a measurement basis, describe the possible 
measurement outcomes and give the probability for each outcome, 
a. f |0>-i|l>,{|0>,|l» 
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b. f|1>-1|0>,{|0>,|1>} 
c- |—i>, {|0>, |1>} 
d- |0>,{|+>,|->} 
e- ^(|0>-|l»,{|i>,|-i» 
f- |1>, {|i>, |-i)} 

g. |+>,{i|0> + f |l),f |0>-i|l>} 

Exercise 2.7. For each of the following states, describe all orthonormal bases that include that 
state. 

a. ^(|0>+i|l» 

b. i±*|0) - i^|l> 

c. -^(|0> + t> /6 |l)) 

d. I|+)-if |-> 

Exercise 2.8. Alice is confused. She understands that 11} and — 11) represent the same state. But 
she does not understand why that does not imply that i (10) + 11)) and 4^ (10) — 11)) would be 
the same state. Can you help her out? 

Exercise 2.9. In the BB84 protocol, how many bits do Alice and Bob need to compare to have a 
90 percent chance of detecting Eve’s presence? 

Exercise 2.10. Analyze Eve’s success in eavesdropping on the BB84 protocol if she does not 
even know which two bases to choose from and so chooses a basis at random at each step. 

a. On average, what percentage of bit values of the final key will Eve know for sure after listening 
to Alice and Bob’s conversation on the public channel? 

b. On average, what percentage of bits in her string are correct? 

c. How many bits do Alice and Bob need to compare to have a 90 percent chance of detecting 
Eve’s presence? 

Exercise 2.11. B92 quantum key distribution protocol. In 1992 Bennett proposed the following 
quantum key distribution protocol. Instead of encoding each bit in either the standard basis or the 
Hadamard basis as is done in the BB84 protocol, Alice encodes her random string x as follows 

0h> |0) 

1^ |+> = -U|0> + |l» 

V2 

and sends them to Bob. Bob generates a random bit string y. If y,- = 0 he measures the i th qubit 
in the Hadamard basis {|+>, |— >}, if y, = 1 he measures in the standard basis {|0), |1}}. In this 
protocol, instead of telling Alice over the public classical channel which basis he used to measure 
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Figure 2.7 

Bloch sphere representation of single-qubit quantum states. 


each qubit, he tells her the results of his measurements. If his measurement resulted in |+) or |0) 
Bob sends 0; if his measurement indicates the state is 11) or | —), he sends 1. Alice and Bob discard 
all bits from strings x and y for which Bob’s bit value from measurement yielded 0, obtaining 
strings x' and y'. Alice uses x' as the secret key and Bob uses y'. Then, depending on the security 
level they desire, they compare a number of bits to detect tampering. They discard these check 
bits from their key. 

a. Show that if Bob receives exactly the states Alice sends, then the strings x' and y' are identical 
strings. 

b. Why didn’t Alice and Bob decide to keep the bits of .r and y for which Bob’s bit value from 
measurement was 0? 

c. What if an eavesdropper Eve measures each bit in either the standard basis or the Hadamard 
basis to obtain a bit string z and forwards the measured qubits to Bob? On average, how many 
bits of Alice and Bob’s key does she know for sure after listening in on the public classical? If 
Alice and Bob compare s bit values of their strings x' and y', how likely are they to detect Eve’s 
presence? 

Exercise 2.12. Bloch Sphere: Spherical coordinates: 

a. Show that the surface of the Bloch sphere can be parametrized in terms of two real-valued 
parameters, the angles 6 and cf> illustrated in figure 2.7. Make sure your parametrization is in 
one-to-one correspondence with points on the sphere, and therefore single-qubit quantum states, 
in the range 6 e [0, tx] and (f> e [0, 27r] except for the points corresponding to |0) and 11>. 

b. What are 6 and (p for each of the states |+>, |—}, |i), and | —i)? 
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Exercise 2.13. Relate the four parametrizations of the state space of a single qubit to each other: 
Give formulas for 

a. vectors in ket notation 

b. elements of the extended complex plane 

c. spherical coordinates for the Bloch sphere (see exercise 2.12) 
in terms of the x, y, and z coordinates of the Bloch sphere. 

Exercise 2.14. 

a. Show that antipodal points on the surface of the Block sphere represent orthogonal states. 

b. Show that any two orthogonal states correspond to antipodal points. 
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The first glimpse into why encoding information in quantum states might support more efficient 
computation comes when examining systems of more than one qubit. Unlike classical systems, 
the state space of a quantum system grows exponentially with the number of particles. Thus, when 
we encode computational information in quantum states of a system of n particles, there are vastly 
more possible computation states available than when classical states are used to encode the infor¬ 
mation. The extent to which these large state spaces corresponding to small amounts of physical 
space can be used to speed up computation will be the subject of much of the rest of this book. 

The enormous difference in dimension between classical and quantum state spaces is due to 
a difference in the way the spaces combine. Imagine a macroscopic physical system consisting 
of several components. The state of this classical system can be completely characterized by 
describing the state of each of its component pieces separately. A surprising and unintuitive as¬ 
pect of quantum systems is that often the state of a system cannot be described in terms of the 
states of its component pieces. States that cannot be so described are called entangled states. 
Entangled states are a critical ingredient of quantum computation. 

Entangled states are a uniquely quantum phenomenon; they have no classical counterpart. Most 
states in a multiple-qubit system are entangled states; they are what fills the vast quantum state 
spaces. The impossibility of efficiently simulating the behavior of entangled states on classical 
computers suggested to Feynman, Manin, and others that it might be possible to use these quan¬ 
tum behaviors to compute more efficiently, leading to the development of the field of quantum 
computation. 

The first few sections of this chapter will be fairly abstract as we develop the mathematical 
formalism to discuss multiple-qubit systems. We will try to make this material more concrete 
by including many examples. Section 3.1 formally describes the difference between the way 
quantum and classical state spaces combine, the difference between the direct sum of two or more 
vector spaces and the tensor product of a set of vector spaces. Section 3.1 then explores some 
of the implications of this difference, including the exponential increase in the dimension of a 
quantum state space with the number of particles. Section 3.2 formally defines entangled states 
and begins to describe their uniquely quantum behavior. As a first illustration of the usefulness 
of this behavior, section 3.4 discusses a second quantum key distribution scheme. 
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3.1 Quantum State Spaces 

In classical physics, the possible states of a system of n objects, whose individual states can 
be described by a vector in a two-dimensional vector space, can be described by vectors in a 
vector space of 2 n dimensions. Classical state spaces combine through the direct sum. However, 
the combined state space of n quantum systems, each with states modeled by two-dimensional 
vectors, is much larger. The vector spaces associated with the quantum systems combine through 
the tensor product, resulting in a vector space of 2" dimensions. We begin by reviewing the formal 
definition of a direct sum as well as of the tensor product in order to compare the two and the 
difference in size between the resulting spaces. 

3.1.1 Direct Sums of Vector Spaces 

The direct sum V © W of two vector spaces V and W with bases A — {la!}, |of 2 )•• • • •, |or,,)} 
and B = |/f 2 ), • ■«, \Pm)} respectively is the vector space with basis AU B— 

{| or i}, |a 2 ), • • ■, | /Si), |/3 2 >, ■ ■ ■> I Pm)}- The order of the basis is arbitrary. Every element 

\x) e V © W can be written as \x) — |u) ® |w) for some \ v) e V and \ w) e W. For V and W of 
dimension /? and m respectively, V © W has dimension n + m: 

dim(E © W ) = dim(E) + dim(VT). 

Addition and scalar multiplication are defined by performing the operation on the two component 
vector spaces separately and adding the results. When V and W are inner product spaces, the 
standard inner product on V © W is given by 

O2I © <w 2 |)(|th> © |u>i» = <u 2 |ih> + (w 2 |wi>. 

The vector spaces V and W embed in V © W in the obvious canonical way, and the images are 
orthogonal under the standard inner product. 

Suppose that the state of each of three classical objects 0 \, O 2 , and (h is fully described by two 
parameters, the position x,- and the momentum p,. Then the state of the system can be described 
by the direct sum of the states of the individual objects: 

/ -vi \ 


Pi 



U 


More generally, the state space of n such classical objects has dimension 2 n. Thus the size of the 
state space grows linearly with the number of objects. 
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3.1.2 Tensor Products of Vector Spaces 

The tensor product V ® W of two vector spaces V and W with bases A — {|ari>, \a 2 ), ..., |a„}} 
and B = {|/3i), \f5i), ..., |/3 m }} respectively is an nm-dimensional vector space with a basis con¬ 
sisting of the nm elements of the form |a,-> ® \fij) where ® is the tensor product, an abstract 
binary operator that satisfies the following relations: 

(|th) + |^ 2 >) ®> |w) = |ih> 0 |w> + \ v 2 ) 0 |u>) 

|u) 0 (|wi> + \w 2 )) — |u) 0 |wi> + |t>) 0 \w 2 ) 

0 |w> = |u> 0 (fl|w>) = fl(|u> 0 |w>). 

Taking k — min(n, m ), all elements of V 0 W have form 
IVi> 0 |wi) + \v 2 ) 0 | W 2 ) H-h \v k ) 0 I w k ) 

for some i>, e V and w, e W. Due to the relations defining the tensor product, such a represen¬ 
tation is not unique. Furthermore, while all elements of V ® W can be written 

“i(|at> ® l/Si» +a2(|a2> ®> |/Si>) H-ha„,„(|a„} 0 

most elements of V ® W cannot be written as |u> ® | w>, where v e V and w e W. It is common 
to write |v}|w> for |v> 0 |w>. 


Example 3.1.1 Let V and W be two-dimensional vector spaces with orthonormal bases A — 
{|ai>, l« 2 )} and B = {\/3i), \fi 2 )} respectively. Let \v) — a x \a\) + a 2 \a 2 ) and \w) = b\\fi\)+ 
b 2 \p 2 ) be elements of V and W. Then 

|u> <g> |iu) = ci\b\ |ai> ® |^i> +a\b 2 \a{) 0 \p 2 ) + a 2 bi\a 2 ) ® \Pi) + a 2 b 2 \a 2 ) 0 \p 2 ). 

If V and W are vector spaces corresponding to a qubit, each with standard basis {|0>, 11}}, 
then V 0 W has {|0> 0 |0>, |0> ® 11), 11) ® |0>, |1> 0 |1>} as basis. The tensor product of two 
single-qubit states oi|0) + bi|l) and 02 10> + Z? 2 11) is flifl 2 | 0 > 0 |0> + 10) 0 |1> + < 22 ^ 1 11} ®> 

|0) + a 2 b 2 \l) 0 11). 


To write examples in the more familiar matrix notation for vectors, we must choose an ordering 
for the basis of the tensor product space. For example, we can choose the dictionary ordering 
{|«1>|^1>, loti)!^), |a 2 >|ySi>, |« 2 >|y6 2 >}. 


Example 3.1.2 With the dictionary ordering of the basis for the tensor product space, the tensor 
product of the unit vectors with matrix representation |u> = -4(1, — 2) + and |u>} = (— 1, 3) + 

is the unit vector |u} 0 |u>} = ^=(— 1, 3, 2, —6)4 
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If V and W are inner product spaces, then V <3 W can be given an inner product by taking the 
product of the inner products on V and W\ the inner product of |m) <8> |iui> and |i> 2 ) <8> |u> 2 } is 
given by 

«u 2 | ® (w 2 |) • (|ui) ® |tui» = {V2\Vl){u) 2 \Wi), 

The tensor product of two unit vectors is a unit vector, and given orthonormal bases {|a, }} for 

V and {|/3,}} for W, the basis {|a,} ® \Pj)} for V ® W is also orthonormal. The tensor product 

V ® W has dimension dim( V ) x dimtVV), so the tensor product of n two-dimensional vector 
spaces has 2" dimensions. 

Most elements |w) e V <S) W cannot be written as the tensor product of a vector in V and a 
vector in W (though they are all linear combinations of such elements). This observation is of 
crucial importance to quantum computation. States of V ® W that cannot be written as the tensor 
product of a vector in V and a vector in W are called entangled states. As we will see, for most 
quantum states of an n -qubit system, in particular for all entangled states, it is not meaningful to 
talk about the state of a single qubit of the system. 

A tensor product structure also underlies probability theory. While the tensor product structure 
there is rarely mentioned, a common source of confusion is a tendency to try to impose a direct 
sum structure on what is actually a tensor product structure. Readers may find it useful to read 
section A.l, which discusses the tensor product structure inherent in probability theory, which 
illustrates the use of tensor product in another, more familiar, context. Readers may also wish to 
do exercises A. 1 through A.4. 

3.1.3 The State Space of an n-Qubit System 

Given two quantum systems with states represented by unit vectors in V and W respectively, the 
possible states of the joint quantum system are represented by unit vectors in the vector space 

V <g> W. For 0 < i < n, let V, be the vector space, with basis {|0),-, 11), }, corresponding to a sin¬ 
gle qubit. The standard basis for the vector space V„_i ® • • • <8> Vj <g> Vo for an n-qubit system 
consists of the 2" vectors 

{|0)„_i®---®|0)i®|0>o, 

|0 >„—1 ® - - - ® |0>i ® |1) 0 , 

10 >„—1 ® • • • ® | l)i ® |0) 0 , 


Il)n- 1 ®"-®|1) 1 ®|1) 0 }. 

The subscripts are often dropped, since the corresponding qubit is clear from position. The 
convention that adjacency of kets means the tensor product enables us to write this basis more 
compactly: 
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{| 0 )...| 0 >| 0 >, 
|0> - - -10>|1>, 
|0> - - -11>|0>, 


II)---ll>ll)}. 

Since the tensor product space corresponding to an n -qubit system occurs so frequently 
throughout quantum information processing, an even more compact and readable notation uses 
|b„_i .. .bo) to represent \b„-\) ® ® \bo). In this notation the standard basis for an n-qubit 

system can be written 

{| 0 - - - 00 >, | 0 - - ■ 01 ), | 0 --■ 10 >,..., |1 - - - 11 }}. 

Finally, since decimal notation is more compact than binary notation, we will represent the state 
... ho) more compactly as |x), where h, are the digits of the binary representation for the 
decimal number x. In this notation, the standard basis for an n-qubit system is written 

{ 10 ), 11 ), 12 ),..., 12 ” — 1 >} - 

The standard basis for a two-qubit system can be written as 
{|00>, |01), 110), |11)} = {|0), 11), |2>, |3)}, 
and the standard basis for a three-qubit system can be written as 
{|000), |001), |010>, |0U), 1100), 1101), 1110), |111)} 

= {|0>, 11 >, |2), |3), |4), |5), |6), |7)}. 

Since the notation |3) corresponds to two different quantum states in these two bases, one a 
two-qubit state, the other a three-qubit state, in order for such notation to be unambiguous, the 
number of qubits must be clear from context. 

We often revert to a less compact notation when we wish to set apart certain sets of qubits, 
to indicate separate registers of a quantum computer, or to indicate qubits controlled by differ¬ 
ent people. If Alice controls the first two qubits and Bob the last three, we may write a state 
as 4f(|00)|101) + |10)|011)), or even as Jj(|00)a| 101)s + |10)x|011) s ), where the subscripts 
indicate which qubits Alice controls and which qubits Bob controls. 
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and 

\ (Il> + 12) + 14) + |7» = l - (1001 > + |010> + 1100) + |U1}) 

represent possible states of a three-qubit system. 


To use matrix notation for state vectors of an n -qubit system, the order of basis vectors must 
be established. Unless specified otherwise, basis vectors labeled with numbers are assumed to be 
sorted numerically. Using this convention, the two qubit state 

I|00> + i|01) + -L|ll}=I|0 > + l|l > + -L|3> 

will have matrix representation 

/ | \ 

2 

0 

V 7s / 

We use the standard basis predominantly, but we use other bases from time to time. For example, 
the following basis, the Bell basis for a two-qubit system, {|<t> + ), |<t> - }, |'I /+ ), |'T~}}, where 

|cb + } = 1/V2(|00> + |11) 

|4> - ) = 1/V2(|00>-|11> 

+ r (3-D 

|'P + ) = 1/V2(|01> + |10> 

I* - ) = 1/V2(|01}-|10>, 

is important for various applications of quantum information processing including quantum tele¬ 
portation. As in the single-qubit case, a state |u) is a superposition with respect to a set of 
orthonormal states {|/6i},..., if it is a linear combination of these states, |u) = + 

■ ■ ■ + cij\/3i), and at least two of the a,- are non-zero. When no set of orthonormal states is specified, 
we will mean that the superposition is with respect to the standard basis. 

Any unit vector of the 2"-dimensional state space represents a possible state of an n -qubit 
system, but just as in the single-qubit case there is redundancy. In the multiple-qubit case, not 
only do vectors that are multiples of each other refer to the same quantum state, but properties of 
the tensor product also mean that phase factors distribute over tensor products; the same phase 
factor in different qubits of a tensor product represent the same state: 

|u) ® (e^l w)) = ® |in}) = (e'^ju}) ® |w). 

Phase factors in individual qubits of a single term of a superposition can always be factored out 
into a single coefficient for that term. 
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Example 3.1.4 -^(|0> + |1» ® ^(|0> + |1» = ±(|00) + |01> + |10) + 111» 


Example3.1.5 (i|0) + f |1»®(j.|0> + ^|1» = ^|00) + ^101) + $|10) + ^|11» 


Just as in the single-qubit case, vectors that differ only in a global phase represent the same 
quantum state. If we write every quantum state as 

«o|0... 00) + fli |0... 01) H-h &2 n — 111... 11) 


and require the first non-zero a, to be real and non-negative, then every quantum state has a 
unique representation. Since this representation uniquely represents quantum states, the quantum 
state space of an «-qubit system has 2" — 1 complex dimensions. For any complex vector space of 
dimension N, the space in which vectors that are multiples of each other are considered equivalent 
is called complex projective space of dimension ;V — I. So the space of distinct quantum states 
of an n -qubit system is a complex projective space of dimension 2" — 1. 

Just as in the single-qubit case, we must be careful not to confuse the vector space in which we 
write our computations with the quantum state space itself. Again, we must be careful to avoid 
confusion between the relative phases between terms in the superposition, of critical importance 
in quantum mechanics, and the global phase which has no physical meaning. Using the notation 
of section 2.5.1, we write |v) ~ |iu) when two vectors |v) and |iy) differ only by a global phase 
and thus represent the same quantum state. For example, even though |00) ~ e'^jOO), the vectors 
|u) = 4^(<?'^|00) + 111)) and |iu) = -4^(|00) + 111)) represent different quantum states, which 
behave differently in many situations: 


- 4 (^ 100 )+ 111 ))/ 4 =(| 00 ) + | 11 ». 

V2 V2 

However, 


1 




1 


— (^|00> + ^|11»~ ^=(|00> + |11 ))~— (| 00 ) + 111 )). 


Quantum mechanical calculations are usually performed in the vector space rather than in the 
projective space because linearity makes vector spaces easier to work with. But we must always 
be aware of the ~ equivalence when we interpret the results of our calculations as quantum states. 
Further confusions arise when states are written in different bases. Recall from section 2.5.1 that 
|+) = -^(|0) + 11)) and |—) = 4|(|0) — 11)). The expression 4|(|+) + |—)) is a different way 
of writing 10), and -^(|0)|0) + 11)| 1)) and -^(|+)|+) + |—) |—)) are simply different expressions 
for the same vector. 
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Fluency with properties of tensor products, and with the notation just presented, will be crucial 
for understanding the rest of the book. The reader is strongly encouraged to work exercises 3.1 
through 3.9 at this point to begin to develop that fluency. 

3.2 Entangled States 

As we saw in section 2.5.2, a single-qubit state can be specified by a single complex number so any 
tensor product of n individual single-qubit states can be specified by n complex numbers. But in the 
last section, we saw that it takes 2" — 1 complex numbers to describe states of an n -qubit system. 
Since 2" n, the vast majority of n-qubit states cannot be described in terms of the state of n 
separate single-qubit systems. States that cannot be written as the tensor product of n single-qubit 
states are called entangled states. Thus the vast majority of quantum states are entangled. 


Example 3.2.1 The elements of the Bell basis (Equation 3.1) are entangled. For instance, the Bell 
state |<J>+} = -U|00> + 111}) cannot be described in terms of the state of each of its component 
qubits separately. This state cannot be decomposed, because it is impossible to find a \, b \, /; 2 
such that 

(«!|0> +*i|l))® (a 2 |0> + &2|1» = -U|00> + |11)), 

V2 

since 

(a 1 |0)+*i|l»®(a 2 |0> + & 2 |l» =a 1 a 2 m+a l b 2 \ 0 l)+b l a 2 \l 0 )+b 1 b 2 \n) 

and = 0 implies that either a\ai — 0 or £>i/? 2 = 0. Two particles in the Bell state |<t> + } are 
called an EPR pair for reasons that will become apparent in section 4.4. 


Example 3.2.2 Other examples of two-qubit entangled states include 

I'l'+j = -4(101} + |10», 

V2 


— (|00>-i|ll», 

V2 

i V99 
— 100 ) + —— 111 }), 
10 1 ' 10 1 ' 


and 

— 100 } + — 101 > + — 110 } + — 111 }). 

10 1 ' 10 1 ' 10 10 
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The four entangled states 


| 0 >+} 

ion 

and 

l* + > 

hin 


i 

71 

i 

71 

i 

71 

i 
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(| 00 ) + 111 » 
(| 00 )-| 11 » 

(|01> + |10» 
(| 01 > - | 10 » 


of equation 3.1 are called Bell states. Bell states are of fundamental importance to quantum 
information processing. For example, section 5.3 exhibits their use for quantum teleportation and 
dense coding. Section 10.2.1 shows that these states are maximally entangled. 

Strictly speaking, entanglement is always with respect to a specified tensor product decomposi¬ 
tion of the state space. More formally, given a state |i jr) of some quantum system with associated 
vector space V and a tensor decomposition of V, V = V\ ® ® V„, the state |i fr) is separable, 

or unentangled, with respect to that decomposition if it can be written as 

I it) = |i>i) <E> • • • <E> \v n ), 

where |i>, ) is contained in V,. Otherwise, | f) is entangled with respect to this decomposition. 

Unless we specify a different decomposition, when we say an «-qubit state is entangled, we 
mean it is entangled with respect to the tensor product decomposition of the vector space V 
associated to the n-qubit system into the n two-dimensional vector spaces V„_i,... Vo associated 
with each of the individual qubits. For such statements to have meaning, it must be specified or 
clear from context which of the many possible tensor decompositions of V into two-dimensional 
spaces corresponds with the set of qubits under consideration. 

It is vital to remember that entanglement is not an absolute property of a quantum state, but 
depends on the particular decomposition of the system into subsystems under consideration; states 
entangled with respect to the single-qubit decomposition may be unentangled with respect to other 
decompositions into subsystems. In particular, when discussing entanglement in quantum com¬ 
putation, we will be interested in entanglement with respect to a decomposition into registers, 
subsystems consisting of multiple qubits, as well as entanglement with respect to the decompo¬ 
sition into individual qubits. The following example demonstrates how a state can be entangled 
with respect to one decomposition and not with respect to another. 


Example 3.2.3 Multiple meanings of entanglement. We say that the four-qubit state 


\f) = i(|00) + 111) + 122} + 133)) = ^(|0000) + |0101> + |1010) + |1111» 
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is entangled, since it cannot be expressed as the tensor product of four single-qubit states. That the 
entanglement is with respect to the decomposition into single qubits is implicit in this statement. 
There are other decompositions with respect to which this state is unentangled. For example, <//} 
can be expressed as the product of two two-qubit states: 

W = ^(|0>l|0>2|0>3|0>4+|0)i|l>2|0) 3 |l>4 + |l>l|0>2|l)3|0>4 + |l>l|l)2|l>3|l)4 


1 

Vi 


(|0>l|0>3 + |l> 1 |l> 3 )® 


1 

vi 


(1 0 } 210} 4 + 1 1 > 21 1 ) 4)» 


where the subscripts indicate which qubit we are talking about. So | ijr) is not entangled with 
respect to the system decomposition consisting of a subsystem of the first and third qubit and a 
subsystem consisting of the second and fourth qubit. On the other hand, the reader can check that 
|i//} is entangled with respect to the decomposition into the two two-qubit systems consisting of 
the first and second qubits and the third and fourth qubits. 


It is important to recognize that the notion of entanglement is not basis dependent, even though 
it depends on the tensor decomposition under consideration; there is no reference, explicit or 
implicit, to a basis in the definition of entanglement. Certain bases may be more or less convenient 
to work with, depending for instance on how much they reflect the tensor decomposition under 
consideration, but that choice does not affect what states are considered entangled. 

In section 2.3, we puzzled over the meaning of quantum superpositions. We now extend the 
remarks we made on the meaning of superpositions in section 2.3 to the multiple-qubit case. As 
in the single-qubit case, most n -qubit states are superpositions, nontrivial linear combinations 
of basis vectors. As always, the notion of superposition is basis-dependent; all states are super¬ 
positions with respect to some bases, and not superpositions with respect to other bases. For 
multiple qubits, the answer to the question of what superpositions mean is more involved than in 
the single-qubit case. 

The common way of talking about superpositions in terms of the system being in two states 
“at the same time" is even more suspect in the multiple-qubit case. This way of thinking fails to 
distinguish between states like -^(|00> + 111}) and (|00) + i 11)) that differ only by a relative 
phase and behave differently under a variety of circumstances. Furthermore, which states a system 
is viewed as “being in at the same time” is basis-dependent; the expressions -^(|00) + |11>) and 
4(l+)l+) + |— )|—}) represent the same state but have different interpretations, one as being in 

the states 100) and 111 > at the same time, and the other as being in the states | ++> and |-} at the 

same time, in spite of being the same state and thus behaving in precisely the same way under all cir¬ 
cumstances. This example underscores that quantum superpositions are not probabilistic mixtures. 

Sections 3.4 and 4.4 will illustrate how the basis dependence of this interpretation obscures 
an essential part of the quantum nature of these states, an aspect that becomes apparent only 
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when such states are considered in different bases. Nevertheless, as long as one is aware that this 
description should not be taken too literally, it can be helpful at first to think of superpositions as 
being in multiple states at once. Over the course of this chapter and the next, you will begin to 
develop more of a feel for the workings of these states. 

Not only is entanglement between qubits key to the exponential size of quantum state spaces 
of multiple-qubit systems, but, as we will see in sections 3.4, 5.3.1, and 5.3.2, particles in an 
entangled state can also be used to aid communication of both classical and quantum information. 
Furthermore, the quantum algorithms of part II exploit entanglement to speed up computation. 
The way entangled states behave when measured is one of the central mysteries of quantum 
mechanics, as well as a source of power for quantum information processing. Entanglement and 
quantum measurement are two of the uniquely quantum properties that are exploited in quantum 
information processing. 

3.3 Basics of Multi-Qubit Measurement 

The experiment of section 2.1.2 illustrates how measurement of a single qubit is probabilistic 
and transforms the quantum state into a state compatible with the measuring device. A similar 
statement is true for measurements of multiple-qubit systems, except that the set of possible 
measurements and measurement outcomes is significantly richer than in the single-qubit case. 
The next paragraph develops some mathematical formalism to handle the general case. 

Let V be the N = 2" dimensional vector space associated with an n -qubit system. Any device 
that measures this system has an associated direct sum decomposition into orthogonal subspaces 

V = Si®---®S k 

for some k < N. The number k corresponds to the maximum number of possible measurement out¬ 
comes for a state measured with that particular device. This number varies from device to device, 
even between devices measuring the same system. That any device has an associated direct sum 
decomposition is a direct generalization of the single-qubit case. Every device measuring a single¬ 
qubit system has an associated orthonormal basis {|iq}, |V 2 >} for the vector space V associated 
with the single-qubit system; the vectors |u, } each generate a one-dimensional subspace S , (con¬ 
sisting of all multiples a\Vj) where a is a complex number), and V = S) © .ST Furthermore, the 
only nontrivial decompositions of the vector space V are into two one-dimensional subspaces, and 
any choice of unit length vectors, one from each of the subspaces, yields an orthonormal basis. 

When a measuring device with associated direct sum decomposition V = Si ® • • • © S k inter¬ 
acts with an n -qubit system in state 1 1 jr), the interaction changes the state to one entirely contained 
within one of the subspaces, and chooses the subspace with probability equal to the square of the 
absolute value of the amplitude of the component of \ijf) in that subspace. More formally, the 
state |i/f) has a unique direct sum decomposition | i/r) = a llVq) © • • • ©where 1 1 ///) is a 
unit vector in Sj and a,- is real and non-negative. When | \[r } is measured, the state | i/f;) is obtained 
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with probability |a, | 2 . That any measuring device has an associated direct sum decomposition, 
and that the interaction can be modeled in this way, is an axiom of quantum mechanics. It is not 
possible to prove that every device behaves in this way, but so far it has provided an excellent 
model that predicts the outcome of experiments with high accuracy. 


Example 3.3.1 Single-qubit measurement in the standard basis. Let V be the vector space asso¬ 
ciated with a single-qubit system. A device that measures a qubit in the standard basis has, by 
definition, the associated direct sum decomposition V = Si © S 2 , where Si is generated by |0) 
and S 2 is generated by 11). An arbitrary state |t/r) = a|0}+&|l) measured by such a device will 
be 10) with probability | a | 2 , the amplitude of | \[r > in the subspace Si, and 11} with probability | b \ 2 . 


Example 3.3.2 Single-qubit measurement in the Hadamard basis. A device that measures a single 
qubit in the Hadamard basis 

{|+>=^=(| 0 > + | 1 )),|-} = ^=(| 0 }-| 1 »} 

V2 V2 

has associated subspace decomposition V — 5+ © S_, where S+ is generated by |+) and S_ is 
generated by |— >. Astate | t/r) = a|0) + b|l) can be rewritten as \fi) = ^|+) + |—), so the 

probability that | xff) is measured as |+) will be | |“ and |—> will be | 


The next two examples describe measurements of two-qubit states that are used in the 
entanglement-based quantum key distribution protocol described in section 3.4. Chapter 4 
explores measurement of multiple-qubit systems in more detail and builds up the standard 
notational shorthand for describing quantum measurements. 


Example 3.3.3 Measurement of the first qubit of a two-qubit state in the standard basis. 
Let V be the vector space associated with a two-qubit system. A device that measures the 
first qubit in the standard basis has associated subspace decomposition V — Si © S 2 where 
Si = |0} ® V 2 , the two-dimensional subspace spanned by {100), |01}}, and S 2 = 11) <8> V 2 , which 
is spanned by {| 10), j 11)}. To see what happens when such a device measures an arbitrary 
two-qubit state \\fi) = a 0 o|00) +aot|01) + aio|10) + an|ll), we write \x[r) — cil^i) + C 2 I fii) 
where 1^} = 1/ci (a 0 o|00) + o 0 i |01)) e Si and 1 ^ 2 } = l/c 2 («iol 10) + flu 111}) e S 2 , with c\ — 
I«oo1 2 + |aoi I 2 and C 2 = Vlfliol 2 + |fln | 2 as the normalization factors. Measurement of 1 1 jr) with 
this device results in the state \ fi\) with probability |ci | 2 = |«oo| 2 + |aot I 2 and the state 1 1 /^ 2 ) with 
probability |c 2 | 2 = |flto| 2 + |flii| 2 - In particular, when the Bell state |<L> + ) = 4^(|00) + 111)) is 
measured, we obtain |00) and 111) with equal probability. 
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Example 3.3.4 Measurement of the first qubit of a two-qubit state in the Hadamard basis. A 
device that measures the first qubit of a two-qubit system with respect to the Hadamard basis 
{|+), I - )} has an associated direct sum decomposition V — © Sf where S[ = |+) <8> V 2 , the 

two-dimensional subspace spanned by {|+)|0), |+}|1)}, and S' 2 — | —} <8> V 2 . We write | t/r) = 
«oo|00) +floi|01> +aio|10> + an111) as \fr) — a[\i/r[) + a' 2 \\lr 2 ), where 

| W .,(»i-| + ,|0, + ^ | +m) ) 

and 

i f ,) = 4 (« £H)|0)+ « 1 |_ ) ii)) 

We leave it to the reader to calculate c\ and c ' 2 and the probabilities for the two outcomes, and 
to show that such a measurement on the state |<E> + ) = -h(|00) + |11> yields |+>|+> and |— >|—> 
with equal probability. 


3.4 Quantum Key Distribution Using Entangled States 

In 1991, Artur Ekert developed a quantum key distribution scheme that makes use of special 
properties of entangled states. The Ekert 91 protocol resembles the BB84 protocol of section 2.4 
in some ways. In his protocol, Alice and Bob establish a shared key by separately performing 
random measurements on their halves of an EPR pair and then comparing which bases they used 
over a classical channel. 

Because Alice and Bob do not exchange quantum states during the protocol, and an eavesdrop¬ 
per Eve cannot learn anything useful by listening in on the classical exchange alone. Eve’s only 
chance to obtain information about the key is for her to interact with the purported EPR pair as it 
is being created or transmitted in the setup for the protocol. For this reason it is easier to prove 
the security of protocols based on entangled states. Such proofs have then been modified to prove 
the security of other QKD protocols like BB84. As with BB84, we describe only the protocol; 
tools developed in later chapters are needed to describe many of Eve’s possible attacks and to 
give a proof of security. Exercise 3.15 analyzes the limited effectiveness of some simple attacks 
Eve could make. 

The protocol begins with the creation of a sequence of pairs of qubits, all in the entangled state 
| <J> + ) = 4^ (|00) + 111) • Alice receives the first qubit of each pair, while Bob receives the second. 
When they wish to create a secret key, for each qubit they both independently and randomly 
choose either the standard basis {|0>, 11>} or the Hadamard basis {|+>, |—>} in which to measure, 
just as in the BB84 protocol. After they have made their measurements, they compare bases and 
discard those bits for which their bases differ. 
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If Alice measures the first qubit in the standard basis and obtains |0>, then the entire state 
becomes |00). If Bob now measures in the standard basis, he obtains the result |0> with certainty. 
If instead he measures in the Hadamard basis {|+}, |—}}, he obtains |+> and |—) with equal 
probability, since 100) = |0>(4^(|+> +1—>)). Just as in the BB84 protocol, he interprets the 
states |+> and |—} as corresponding to the classical bit values 0 and 1 respectively; thus when he 
measures in the basis {|+> |— >} and Alice measures in the standard basis, he obtains the same bit 
value as Alice only half the time. The behavior is similar when Alice’s measurement indicates her 
qubit is in state 11}. If instead Alice measures in the Hadamard basis and obtains the result that her 
qubit is in the state |+}, the whole state becomes |+}|+). If Bob now measures in the Hadamard 
basis, he obtains |+> with certainty, whereas if he measures in the standard basis he obtains |0} 
and 11} with equal probability. Since they always get the same bit value if they measure in the 
same basis, the protocol results in a shared random key, as long as the initial pairs were EPR 
pairs. The security of the scheme relies on adding steps to the protocol we have just described that 
enable Alice and Bob to test the fidelity of their EPR pairs. We are not yet in a position to describe 
such tests. The tests Ekert suggested are based on Bell’s inequalities (section 4.4.3). Other, more 
efficient tests have been devised. 

This protocol has the intriguing property that in theory Alice and Bob can prepare shared keys 
as they need them, never needing to store keys for any length of time. In practice, to prepare keys 
on an as-needed basis in this way, Alice and Bob would need to be able to store their EPR pairs 
so that they are not corrupted during that time. The capability of long-term reliable storage of 
entangled states does not exist at present. 

3.5 References 

In the early 1980s, Richard Feynman and Yuri Manin separately recognized that certain quan¬ 
tum phenomena associated with entangled particles could not be simulated efficiently on stan¬ 
dard computers. Turning this observation around caused them to speculate whether these 
quantum phenomena could be used to speed up computation in general. Their early musings 
on quantum computation can be found in [121], [150], [202], and [203]. 

More extensive treatments of the tensor product can be found in Arno Bohm’s Quantum 
Mechanics [53], Paul Bamberg and Shlomo Sternberg’s A Course in Mathematics for Students of 
Physics [30], and Thomas Hungerford’s Algebra [158]. 

Ekert’s key distribution protocol based on EPR pairs, originally proposed in [111], has been 
demonstrated in the laboratory [163, 294], Gisin et al. [130] provide a detailed survey of work 
on quantum key distribution including Ekert’s algorithm. 

3.6 Exercises 

Exercise3.1. Let V be a vector space with basis {(1, 0, 0), (0, 1,0), (0,0, 1)}. Give two different 
bases for V ® V. 
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Exercise 3.2. Show by example that a linear combination of entangled states is not necessarily 
entangled. 

Exercise 3.3. Show that the state 

I W„) = -5=(|0... 001} + |0... 010} + |0... 100) + • • • + |1... 000}) 

v« 

is entangled, with respect to the decomposition into the n qubits, for every n > 1. 

Exercise 3.4. Show that the state 


I GHZ,,) 


1 

vi 


(| 00 ... 0 } + 111 ... 1 » 


is entangled, with respect to the decomposition into the n qubits, for every n > 1 . 

Exercise 3.5. Is the state -h(|0}|+} + |1}|— }) entangled? 

Exercise 3.6. If someone asks you whether the state |+) is entangled, what will you say? 

Exercise 3.7. Write the following states in terms of the Bell basis, 
a. |00} 

b- l+>|-> 

c. -L(|00) + |01> + |10» 

Exercise 3.8. 

a. Show that -^(|0}|0) + 11)11)) and ^(|+}|+) + |—}|—}) refer to the same quantum state. 

b. Show that -^(|0}|0) — 11) 11)) refers to the same state as 4j(|i)|i) + |— i)|— i}). 

Exercise 3.9. 

a. Show that any n -qubit quantum state can be represented by a vector of the form 

do 10... 00) + fli|0...01} + -- -+ U2 n — 111... 11) 

where the first non-zero a, is real and non-negative. 

b. Show that this representation is unique in the sense that any two different vectors of this form 
represent different quantum states. 

Exercise 3.10. Show that for any orthonormal basis B = {|/3i}, ..., \P„)} and vectors |u) = 

a \\P\) + + •••+««I Pn) and | w) = Ci|A) + C2I/I2} + • • • + c n | fi n ) 

a. the inner product of |i>) and \w) is cifli + H— • + 0202 , and 

b. the length squared of | v) is | |u) |“ = (u| v) = |fli |" + |« 2 1 2 + • —h ||“. 

Write all steps in Dirac’s bra/ket notation. 
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Exercise 3.11. Let |i/r) be an n-qubit state. Show that the sum of the distances from | ifr) to the 
standard basis vectors \j) is bounded below by a positive constant that depends only on n, 

£lw-l./>l>c, 

i 

where |u| indicates the length of the enclosed vector. Specify such a constant C in terms of n. 

Exercise 3.12. Give an example of a two-qubit state that is a superposition with respect to the 
standard basis but that is not entangled. 

Exercise 3.13. 

a. Show that the four-qubit state | \[r) = i(|00) + 111 > + |22> + |33})ofexample3.2.3isentangled 
with respect to the decomposition into two two-qubit subsystems consisting of the first and second 
qubits and the third and fourth qubits. 

b. For the four decompositions into two subsystems consisting of one and three qubits, say 
whether |i/r) is entangled or unentangled with respect to each of these decompositions. 

Exercise 3.14. 

a. For the standard basis, the Hadamard basis, and the basis B — {4^(|0>+i|l>, |0) — i| 1)}, 
determine the probability of each outcome when the second qubit of a two-qubit system in the 
state 100) is measured in each of the bases. 

b. Determine the probability of each outcome when the second qubit of the state |00) is first 
measured in the Fladamard basis and then in the basis B of part a). 

c. Determine the probability of each outcome when the second qubit of the state |00) is first 
measured in the Fladamard basis and then in the standard basis. 

Exercise 3.15. This exercise analyzes the effectiveness of some simple attacks an eavesdropper 
Eve could make on Ekert’s entangled state based QKD protocol. 

a. Say Eve can measure Bob’s half of each of the EPR pairs before it reaches him. Say she always 
measures in the standard basis. Describe a method by which Alice and Bob can determine that 
there is only a 2~ s chance that this sort of interference by Eve has gone undetected. What happens 
if Eve instead measures each qubit randomly in either the standard basis of the Hadamard basis? 
What happens if she uniformly at random chooses a basis from all possible bases? 

b. Say Eve can pose as the entity sending the purported EPR pairs. Say instead of sending EPR 
pairs she sends a random mixture of qubit pairs in the states |00), 111), |+)|+), and |—}|—). 
After Alice and Bob perform the protocol of section 3.4, on how many bits on average do their 
purported shared secret keys agree? On average, how many of these bits does Eve know? 
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The nonclassical behavior of quantum measurement is critical to quantum information processing 
applications. This chapter develops the standard formalism used for measurement of multiple- 
qubit systems, and uses this formalism to describe the highly nonclassical behavior of entangled 
states under measurement. In particular, it discusses the EPR paradox and Bell’s theorem, which 
illustrate the nonclassical nature of these states. Section 4.1 extends the Dirac bra/ket notation to 
linear transformations. It will be used in this chapter to describe measurements, and in chapter 5 
to describe quantum transformations acting on quantum systems. Section 4.2 slowly introduces 
some of the notation and standard formalism for quantum measurement. Section 4.3 uses this 
material to give a full description of the standard formalism. Both sections contain a myriad of 
examples. The chapter concludes with a detailed discussion in section 4.4 of the behavior under 
measurement of the most famous of entangled states, EPR pairs. 

4.1 Dirac's Bra/Ket Notation for Linear Transformations 


Dirac’s bra/ket notation provides a convenient way of specifying linear transformations on quan¬ 
tum states. Recall from section 2.2 that the conjugate transpose of the vector denoted by ket | x/r) is 
denoted by bra (t/r|, and the inner product of vectors |i jr) and \<f>) is given by ( \jf\<p }. The notation 
\x) {y | represents the outer product of the vectors \x) and | y ). Matrix multiplication is associative, 
and scalars commute with everything, so relations such as the following hold: 

(\a)(b\)\c) = \a)((b\\c)) 

= mc))\a). 


Let V be a vector space associated with a single-qubit system. The matrix for the operator 
|0> (0| with respect to the standard basis in the standard order {10}, 11>} is 


i«xoi-(i)(i «) = (J °). 
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The notation |0)(1| represents the linear transformation that maps |1) to |0) and |0> to the null 
vector, a relationship suggested by the notation: 


(|0) <11) 11) = |0> ((111 >) = |0> (1) = |0>, 

(|0> <11) |0) = |0>((1|0» = |0>(0) = 0. 

Similarly 

"><°i = ( ° “)• 

»>< 1 ' = (o ?)• 

Thus, all two-dimensional linear transformations on V can be written in Dirac’s notation: 
(id ) = a|0)(0|+fc|0><l|+c|l)<0|+</|l>(l|. 


Example 4.1.1 The linear transformation that exchanges |0) and 11) is given by 

x= |0)(1| + |1>(0|. 

We will also use notation 

X: |0> h * |1> 

|1> i-* |0>, 

which specifies a linear transformation in terms of its effect on the basis vectors. The transfor¬ 
mation X — |0> (11 + 11) (0| can also be represented by the matrix 



with respect to the standard basis. 


Example 4.1.2 The transformation that exchanges the basis vectors |00) and 110) and leaves the 
others alone is written 110) (001 + 100) (101 + 111) (111 + 101) (011 and has matrix representation 

/ 0 0 1 0 \ 

0 10 0 

10 0 0 

\ 0 0 0 1 / 

in the standard basis. 
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An operator on an n -qubit system that maps the basis vector | j) to |i) and all other standard 
basis elements to 0 can be written 

0 = \i)(j\ 

in the standard basis; the matrix for O has a single non-zero entry 1 in the ij ,h place. A general 
operator O with entries a tJ in the standard basis can be written 

0 = EE a bl*><yl- 

i j 

Similarly, the ij th entry of the matrix for O in the standard basis is given by (i\0\j). 

As an example of working with this notation, we write out the result of applying operator O to 
a vector \f) = J2k b k\ k ) : 

0\f) = EE a ij\l)(j\ J | ^ ^ bk \k) ) = EEE aijb k \i){j\\k) 

\ i j / \ k j i j k 

= EE^ji')- 

i j 

More generally, if {|/?*)} is a basis for an A^-dimensional vector space V, then an operator 
O : V —> V can be written as 

N N 

EE*ylft>(^l 

1=1 7=1 

with respect to this basis. In particular, the matrix for O with respect to basis {|/3,}} has entries 
0‘j = b u- 

Initially the vector/matrix notation may be easier for the reader to comprehend because it 
is more familiar, and sometimes this notation is convenient for performing calculations. But it 
requires choosing a basis and an ordering of that basis. The bra/ket notation is independent of 
the basis and the order of the basis elements. It is also more compact, and it suggests correct 
relationships, as we saw for the outer product, so that once it becomes familiar, it is easier to read. 

4.2 Projection Operators for Measurement 

Section 2.3 described measurement of a single qubit in terms of projection onto a basis vector 
associated with the measurement device. This notion generalizes to measurement in multiple-qubit 
systems. For any subspace S of V, the subspace S’- 1 consists of all vectors that are perpendicular 
to all vectors in S. The subspaces S and satisfy V = S © S’ -1 ; thus, any vector | v) e V can be 
written uniquely as the sum of a vector si e S and a vector so e .S ,J ‘. For any S, the projection 
operator P$ is the linear operator P$ : V —*■ S that sends |u) !->■ where |t>) = s*i +S 2 with 
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,s~i e S\ and S 2 e S 2 . We use the notation Sj because i*i and .V 9 are generally not unit vectors. The 
operator \j/)(\lr | is the projection operator onto the subspace spanned by \jr). Projection operators 
are sometimes called projectors for short. For any direct sum decomposition of V — 5j © • • • © Sk 
into orthogonal subspaces 5, there are k related projection operators P, : V —> Sj where P, \ v) = Sj 
where 1 1 >) = H— • + ,sj- with s, e Sj. In this terminology, a measuring device with associated 
decomposition V — Si © • • • © Sk acting on a state \jr) results in the state 




Pi\f) 
\PiW\ 


with probability | P,j t/r} | 2 . 


Example 4.2.1 The projector |0}(0| acts on a single-qubit state 1 // } and obtains the component 
of |i/r) in the subspace generated by |0>. Let |i/r) = a|0) + fo|l>. Then (|0>{0|)|t/f) = a(0|0>|0> + 
i>(0|l)|0) = n|0). 

The projector 11} |0> (11 (0| acts on two-qubit states. Let 
I <P) = «oo 100} + floi |01> + « 10 110) + on 111}. 

Then 

(11)|0>(11(0|) \<p) = aio|l>|0>. 


Let P s be the projection operator from an H-dimensional vector space V onto an s -dimensional 
subspace S with basis {|ao)< • • •, |a s _i>). Then 

5—1 

p s = = l«o>(ao|H-b |a i -i)(a s _i|. 

1 = 1 


Example 4.2.2 Let \\js) = a 0 o|00> +aoi|01> +aio|10) +an|ll> represent a state of a two- 
qubit system with associated vector space V. Let 5j be the subspace spanned by |00), |01). 
The operator Ps = |00)(00| + 1 01 ) (01 1 is the projection operator that sends \jr) to the (non- 
normalized) vector «oo|00) + floi |01). 


Let V and W be two vector spaces with inner product. The adjoint operator or conjugate 
transpose O ( \ V —> W of an operator O : W —> V is defined to be the operator that satisfies the 
following inner product relation. For any v e V and Oii; e W. the inner product between O'u 
and w is the same as the inner product between v and Ow: 

0^v-u> = v- Ow. 

The matrix for the adjoint operator O' of O is obtained by taking the complex conjugate of all 
entries and then the transpose of the matrix for O , where we are assuming consistent use of bases 
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for V and W. Recall from section 2.2 that {. x\ is the conjugate transpose of \x). The reader can 
check that (A|jc))^ = (x \ A'. In bra/ket notation, the relation between the inner product of 0^\x) 
and |tu) and the inner product of \x) and 0\w) is reflected in the notation: 

((x\0)\w) = (x\(0\w)) = (x\0\w). 

The definition of a projection operator P implies that applying a projection operator many times 
in succession has the same effect as just applying it once: P P — P. Furthermore, any projection 
operator is its own adjoint: P — P . Thus 

|R|u )| 2 = ((u|R t )(R|n» = (u|P|u> 

for any projection operator P and all \v) e V. 

To solidify our understanding of projection operators and Dirac’s notation, let us describe 
single-qubit measurement in the standard basis in terms of this formalism. 


Example 4.2.3 Formal treatment of single-qubit measurement in the standard basis. Let V be 
the vector space associated with a single-qubit system. The direct sum decomposition for V asso¬ 
ciated with measurement in the standard basis is V — S © S', where S is the subspace generated 
by |0> and S' is the subspace generated by 11). The related projection operators are P : V —*■ S 
and P' : V —> S', where P — |0> (0| and P' = 11) (11. Measurement of the state |i/r) = «10> + b\ 1) 


results in the state 


P\t) 

kwj 


with probability |.P|i/r)|~. Since 


pw) = m(om) = \o)m) = a \o) 


and 

|R|^>| 2 = {f\P\f) = (^|(|0>(0|)|f/r) = (mm) =aa= |a| 2 , 

the result of the measurement is ^ with probability |a| 2 . Since by section 2.5 an overall phase 
factor is physically meaningless, the state represented by |0) has been obtained with probability 
|a| 2 . A similar calculation shows that the state represented by |1> is obtained with probability \b\ 2 . 


Before giving examples of more interesting measurements, we describe measurement of a 
two-qubit state with respect to the full decomposition associated with the standard basis. 


Example 4.2.4 Measuring a two-qubit state with respect to the full standard basis decom¬ 
position. Let V be the vector space associated with a two-qubit system and \<f>) = aoo|00) + 
«oi |01> + flio| 10} + an 111) an arbitrary two-qubit state. Consider a measurement with decompo¬ 
sition V = Sqo © Soi © Sjo © Sn, where S,j is the one-dimensional complex subspace spanned 
by | ij). The related projection operators Pjj : V -> ,Sj ; - are Pqq — |00}(00|, Poi = |01)(01|, 
P 10 = 110) (10|, and Rn = 111) (111. The state after measurement will be p,i ^ with probability 
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\Pij\%lr)\~. Recall from sections 2.5.1 and 3.1.3 that two unit vectors |v) and |u>) represent the 
same quantum state if |u) = e ,0 \ w) for some 6, and that |i>) ~ |tu) indicates that |u) and \w) 
represent the same quantum state. The state after measurement is either 

Poo\f) = floo 100) |0() , 

l^oWl ~~ lapol 

with probability (f\Poo\ifr) — |aoo| 2 , or |01) with probability |aoi| 2 , or 110) with probability 
|oio| 2 ,or 111) with probability |an| 2 . 


To develop fluency with this material, the reader may now want to rewrite, using this notation, 
the examples of section 3.3. 

More interesting are measurements that give information about the relations between qubit 
values without giving any information about the qubit values themselves. For example, we 
can measure two qubits for bit equality without determining the actual value of the bits. Such 
measurements will be used heavily in quantum error correction schemes. 


Example 4.2.5 Measuring a two-qubit state for bit equality in the standard basis. Let V be 
the vector space associated with a two-qubit system. Consider a measurement with associated 
direct sum decomposition V — Si © S 2 , where S 1 is the subspace generated by { |00) , 1 11)}. the 
subspace in which the two bits are equal, and S 2 is the subspace generated by {110), |01)}, the 
subspace in which the two bits are not equal. Let P\ and I\ be the projection operators onto Sj and 
S 2 respectively. When a system in state \\jf) = aoo|00) + not |01) + fliol 10) + an 111) is measured 


1 this way, with probability \Pi\f)\ = {if \ Pi \ if) , the state after measurement becomes 


Pj\f) 

\PiW\ 


/ 12 12 / 12 12 

Let ci = (f\Pi\\lr) — y |a 0 o| +|an| and c 2 = (f\P w \f) — \J Ki 1“ + I«ip| . After measure¬ 
ment the state will be | u) = — (aoo|00) +an|ll)) with probability |ci| 2 = [aoo[*'+ |an| and 

|u) = — (aoi|01)+aio 110)) with probability |c 2 | 2 = |aoi|”+ |aio| . If the first outcomehappens, 
then we know that the two bit values are equal, but we do not know whether they are 0 or 1. If the 
second case happens, we know that the two bit values are not equal, but we do not know which 
one is 0 and which one is 1. Thus, the measurement does not determine the value of the two bits, 
only whether the two bits are equal. 


As in the case of single-qubit states, most states are a superposition with respect to a mea¬ 
surement’s subspace decomposition. In the previous example, a state that is a superposition 
containing components with both equal and unequal bit values is transformed by measurement 
either to a state (generally still a superposition of standard basis elements), in which in all com¬ 
ponents the bit values are equal, or to a state in which the bit values are not equal in all of the 
components. 
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Before further developing the formalism used to describe quantum measurement, we give an 
additional example, one in which the associated subspaces are not generated by subsets of the 
standard basis elements. 


Example 4.2.6 Measuring a two-qubit state with respect to the Bell basis decomposition. Recall 
from section 3.2 the four Bell states |0+> = -^(|00) + 111)), |0“> = -^(|00) - 111», \V+) = 
^(|01> + |10», and \V~) = -^(|01> - |10». Let V = S 0 + © S 0 - ® Sy+ ® S*- be the direct 
sum decomposition into the subspaces generated by the Bell states. Measurement of the state 100) 
with respect to this decomposition yields | <J> + } with probability 1 /2 and | <J> - ) with probability 1 /2, 
because |00> = 4= (| <t> + ) + | O - }). The reader can determine the outcomes and their probabilities 
for the three other standard basis elements, and a general two-qubit state. 


The next section continues developing the standard formalism used throughout the quantum 
mechanics literature to describe quantum measurement. 

4.3 Hermitian Operator Formalism for Measurement 

Instead of explicitly writing out the subspace decomposition associated with a measurement, 
including the definition of each subspace of the decomposition in terms of a generating set, a 
mathematical shorthand is used. Certain operators, called Hermitian operators, define a unique 
orthogonal subspace decomposition, their eigenspace decomposition. Moreover, for every such 
decomposition, there exists a Hermitian operator whose eigenspace decomposition is this decom¬ 
position. Given this correspondence, Hermitian operators can be used to describe measurements. 
We begin by reminding our readers of definitions and facts about eigenspaces and Hermitian 
operators. 

Let O : V -> V be a linear operator. Recall from linear algebra that if Ov = Xv for some 
non-zero vector v e V, then X is an eigenvalue and v is a X-eigenvector of O. If both v and 
w are ^-eigenvectors of O. then i! + w i s also a ^-eigenvector, so the set of all ^-eigenvectors 
forms a subspace of V called the X-eigenspace of O. For an operator with a diagonal matrix 
representation, the eigenvalues are simply the values along the diagonal. 

An operator O : V —>■ V is Hermitian if it is equal to its adjoint, O ‘ — O. The eigenspaces of 
Hermitian operators have special properties. Suppose X is an eigenvalue of an Hermitian operator 
O with eigenvector |x). Since 

X{x\x) — {x\X\x) = (xa|(0|xa.» = «x| G t )|x> = X{x\x), 

X = X, which means that all eigenvalues of a Hermitian operator are real. 

To give the connection between Hermitian operators and orthogonal subspace decomposi¬ 
tions, we need to show that the eigenspaces , S \ 2 ,..., Si k of a Hermitian operator are 
orthogonal and satisfy © Sx 2 © • • • © Sx k = V. For any operator, two distinct eigenvalues 
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have disjoint eigenspaces since, for any unit vector \x), 0\x) — Ao|x) and 0\x) — Ai|x> imply 
(ao — Ai)|x) = 0, which implies that Ao = Ai. For any Hermitian operator, the eigenvectors for 
distinct eigenvalues must be orthogonal. Suppose | v) is a A-eigenvector and | w) is a /r-eigenvector 
with X ^ pi. Then 

A(u|w} = ((v\0^)\w) = (u|(0|iu)) = pt{v\w). 

Since X and fi are distinct eigenvalues, (i>|w} = 0. Thus, Sx t and Sxj are orthogonal for A,- ^ 
A j. Exercise 4.16 shows that the direct sum of all of the eigenspaces for a Hermitian operator 
O : V —> V is the whole space V. 

Let V be an /V-dimensioruil vector space, and let X \, Ao, .... A/ f be the k < N distinct eigen¬ 
values of an Hermitian operator O : V -> V. We have just shown that V — Sx { © • • • ® Sx k , where 
Sxj is the eigenspace of O with eigenvalue A,-. This direct sum decomposition of V is called the 
eigenspace decomposition of V for the Hermitian operator O. Thus, any Hermitian operator O : 
V —» V uniquely determines a subspace decomposition for V. Furthermore, any decomposition 
of a vector space V into the direct sum of subspaces 5j, ..., 5* can be realized as the eigenspace 
decomposition of a Hermitian operator O : V —»■ V: let Pi be the projectors onto the subspaces 
Si, and let Ai, A 2 ,..., A* be any set of distinct real values; then O = 5~ (=] A, P, is a Hermitian 
operator with the desired direct sum decomposition. Thus, when describing a measurement, 
instead of directly specifying the associated subspace decomposition, we can specify a Hermitian 
operator whose eigenspace decomposition is that decomposition. 

Any Hermitian operator with the appropriate direct sum decomposition can be used to specify 
a given measurement; in particular, the values of the A, are irrelevant as long as they are distinct. 
The A j should be thought of simply as labels for the corresponding subspaces, or equivalently 
as labels for the measurement outcomes. In quantum physics, these labels are often chosen to 
represent a shared property, such as the energy, of the eigenstates in the corresponding eigenspace. 
For our purposes, we do not need to assign labels with meaning; any distinct set of eigenvalues 
will do. 

Specifying a measurement in terms of a Hermitian operator is standard practice throughout the 
quantum-mechanics and quantum-information-processing literature. It is important to recognize, 
however, that quantum measurement is not modeled by the action of a Hermitian operator on 
a state. The projectors Pj associated with a Hermitian operator O , not O itself, act on a state. 
Which projector acts on the state depends on the probabilities pj = {\jj\Pj\tjj). For example, the 
result of measuring |i fr) — a|0> +b\\) according to the Hermitian operator Z = 10)(0| — 11)(11 
does not result in the state a|0> — b\ 1), even though 



Direct multiplication by a Hermitian operator generally does not even result in a well-defined 
state; for example, 
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c:)-(i :)d)-C)' 

The Hermitian operator is only a convenient bookkeeping trick, a concise way of specifying the 
subspace decomposition associated with the measurement. 

4.3.1 The Measurement Postulate 

Many aspects of our model of quantum mechanics are not directly observable by experiment. 
For example, as we saw in section 2.3, given a single instance of an unknown single-qubit state 
a|0) +b\\), there is no way to determine experimentally what state it is in; we cannot directly 
observe the quantum state. It is only the results of measurements that we can directly observe. 
For this reason, the Hermitian operators we use to specify measurements are called obsen’ables. 
The measurement postulate of quantum mechanics states that: 

• Any quantum measurement can be specified by a Hermitian operator O called an observable. 

• The possible outcomes of measuring a state |i fr) with an observable O are labeled by the 
eigenvalues of O. Measurement of state \\[r) results in the outcome labeled by the eigenvalue A,- 
of O with probability | Pj\\jr)\" where P , is the projector onto the A.,-eigenspace. 

• ( Projection ) The state after measurement is the normalized projection Pi\f)/\Pi\ir)\ of IV'’} 
onto the A; -eigenspace .S',. Thus the state after measurement is a unit length eigenvector of O with 
eigenvalue A.;. 

We should make clear that what we have described here is a mathematical formalism for 
measurement. It does not tell us what measurements can be done in practice, or with what 
efficiency. Some measurements that may be mathematically simple to state may not be easy 
to implement. Furthermore, the eigenvalues of physically realizable measurements may have 
meaning—for example, as the position or energy of a particle—but for us the eigenvalues are just 
arbitrary labels. 

While a Hermitian operator uniquely specifies a subspace decomposition, for a given sub¬ 
space decomposition there are many Hermitian operators whose eigenspace decomposition 
is that decomposition. In particular, since the eigenvalues are simply labels for the subspaces 
or possible outcomes, the specific values of the eigenvalues are irrelevant; it matters only which 
ones are distinct. For example, measuring with the Hermitian operator 10) (01 — 11 > (11 results in the 
same states with the same probabilities as measuring with 100|0}(0| — 10011 > {11, but these out¬ 
comes do not agree with the outcomes of the trivial measurement corresponding to the Hermitian 
operator |0)(0| + |1>(1| or42|0>(0| +42|1>(1|. 


Example 4.3.1 Hermitian operator formalism for measurement of a single qubit in the standard 
basis. Using the description in example 4.2.3 of measurement of a single-qubit system in the 
standard basis, let us build up a Hermitian operator that specifies this measurement. The subspace 
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decomposition corresponding to this measurement is V — S © S', where S is the subspace gener¬ 
ated by |0) and S' is generated by 1 1}. The projectors associated with S and S' are P = |0> (0| and 
P' — 1 1 } (11 respectively. Let X and X' be any two distinct real values, say X — 2 and X' = —3. 
Then the operator 

O = 2|0><0|-3|1)<1| = ( o - 3 ) 

is a Hermitian operator specifying the measurement of a single-qubit state in the standard 
basis. 

Any other distinct values for X and X' could have been used. We will generally use either 

i i >< i i=( 2 0 or 


Z = |0><0|-|1>(1| = ( 0 - 1 ) 

to specify single-qubit measurements in the standard basis. 


Example 4.3.2 Hermitian operator formalism for measurement of a single qubit in the Hadamard 
basis. We wish to construct a Hermitian operator corresponding to measurement of a single qubit 
in the Hadamard basis (|+), |— >}. The subspaces under consideration are S+, generated by |+>, 
and S-, generated by |—), with associated projectors P + = |+)(+| = 4 (|0> (0| + |0}(1| + |1)(0| + 
11}(11) and P- = I—>(—| = 5 (|0>(0| — |0) (1| — 11)(0| + 11>(1|). We are free to choose k + and A._ 
any way we like as long as they are distinct. If we take A.+ = 1 and X = — 1, then 

X=|0)<1| + |1><0| = ( ° J ) 

is a Hermitian operator for single-qubit measurement in the Hadamard basis. 


Example 4.3.3 The Hermitian operator A = |01>(011 +2|10)(10| +3|11)(11| has matrix repre¬ 
sentation 

/ 0 0 0 0 \ 

0 10 0 
0 0 2 0 

\ 0 0 0 3 / 


with respect to the standard basis in the standard order {|00), |01), 110), 111)}. The eigenspace 
decomposition for A consists of four subspaces, each generated by one of the standard basis 
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vectors |00), |01), 110), 111). The operator A is one of many Hermitian operators that specify 
measurement with respect to the full standard basis decomposition described in example 4.2.4. 
The Hermitian operator A' = 73100><00| +50|01)(01| — 3| 10}(10| + 23|11}(11| is another. 


Example 4.3.4 The Hermitian operator 


B = |00)(00| + |01> (011 + 7 r(| 10 >( 10 | + 111 > (111) = 


/1 

0 

0 

0 \ 

0 

1 

0 

0 

0 

0 

IT 

0 

V 0 

0 

0 

7T ) 


specifies measurement of a two-qubit system with respect to the subspace decomposition V — 
So © Si, where So is generated by {|00), |01)} and Si is generated by {| 10), 111)}, so B specifies 
measurement of the first qubit in the standard basis as described in example 3.3.3. 


Example 4.3.5 The Hermitian operator 


C = 2(|00}(00] + |11}(11|) + 3(|01}<01| + |10}(10|) 


( 2 

0 

0 

0 \ 

0 

3 

0 

0 

0 

0 

3 

0 

V 0 

0 

0 

2 / 


specifies measurement with respect to the subspace decomposition V — S 2 © S 3 , where S 2 is 
generated by {1 00 ), 1 1 1 )} and S 3 is generated by {1 0 1 }, 1 1 0 ) }, so C specifies the measurement for 
bit equality described in example 4.2.5. 


Given the subspace decomposition for a Hermitian operator O, it is possible to find an orthonor¬ 
mal eigenbasis of V for O. If O has n distinct eigenvalues, as in the general case, the eigenbasis 
is unique up to length one complex factors. If O has fewer than n eigenvalues, some of the 
eigenvalues are associated with an eigenspace of more than one dimension. In this case, a random 
orthonormal basis can be chosen for each eigenspace .S',. The matrix for the Hermitian operator 
O with respect to any of these eigenbases is diagonal. 

Any Hermitian operator O with eigenvalues ),j can be written as O — JA A -jPj, where Pj are 
the projectors for the /.,-eigenspaces of O. Every projector is Hermitian with eigenvalues 1 and 
0 where the 1-eigenspace is the image of the operator. For an w-dimensional subspace S of V 
spanned by the basis {|i 1 ),..., | i m )}, the associated projector 

m 

/ j s = E 10x01 

j =1 
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maps vectors in V into S. If P$ and Pj are projectors for orthogonal subspaces S and T, the 
projector for the direct sum S © T is Ps + Pt- If P is a projector onto subspace S then tr(P), 
the sum of the diagonal elements of any matrix representing P, is the dimension of S. This 
argument applies to any basis since the trace is basis independent. Box 10.1 describes this and 
other properties of the trace. 

Given linear operators A and B on vector spaces V and W respectively, the tensor product 
A ® B acts on elements v ® w of the tensor product space V ® W as follows: 

(A ® B)(v ®w) = Av ® Bw. 

It follows from this definition that 

(A ® B)(C ® D) = AC® BD. 

Let Oo and 0\ be Hermitian operators on spaces Vo and Vi respectively. Then Oo ® O \ is a 
Hermitian operator on the space Vo ® Vi. Furthermore, if (), has eigenvalues a ,7 with associated 
eigenspaces Sjj, then Oo® 0\ has eigenvalues 'k'- k — Ao/A^. If an eigenvalue k'- k — XojMk is 
unique, then its associated eigenspace S' jk is the tensor product of Soj and S\i : . In general, the 
eigenvalues X'j k need not be distinct. An eigenvalue /,' of Oo ® 0\ that is the product of eigenvalues 
of Oq and 0\ in multiple ways, Aj = A.'- 1 = • • • = has eigenspace S = (.S’o/, <E> 5’u,) © 

• • • © ( S 0jm ® S lkm ). 

Most Hermitian operators O on Vo <g> Vi cannot be written as a tensor product of two Hermitian 
operators 0\ and Oi acting on Vo and Vi respectively. Such a decomposition is possible only if 
each subspace in the subspace decomposition described by O can be written as S = So <S) >S'i for 
So and Sj in the subspace decompositions associated to 0\ and Oo respectively. While for most 
Hermitian operators this condition does not hold, it does hold for all of the observables we have 
described so far. For example, 

(J -l) 0 (o 3 ) = (| 0 )( 0 | — | 1 )(l|)<g> ( 2 | 0 >( 0 | 4 - 3 | 1 >< 1 |) 

= 2|00)(00|+3|01>(01| — 2|10>(10| — 3|11}(11| 

specifies the full measurement in the standard basis, but with a different Hermitian operator from 
the one used in example 4.3.3. The operator 

)®/ = |00)(00| + |01}(01| + 7 r(| 10 )( 10 | + 111> < 111) 

specifies measurement of the first qubit in the standard basis as described in example 4.3.4, as 
does Z® /, where Z = |0}(0| — |1)(1|. The Hermitian operator 

Z® Z = |00)(00| - |01)(01| — 110>(10| + |11)(11| 


1 0 
0 it 
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specifies the measurement for bit equality described in example 4.3.5. We now give an example 
of a two-qubit measurement that cannot be expressed as the tensor product of two single-qubit 
measurements. 


Example 4.3.6 Not all measurements are tensor products of single-qubit measurements. Consider 
a two-qubit state. The observable M with matrix representation 


/ o 
0 
0 

V 0 


0 0 0 \ 
0 0 0 
0 0 0 
0 0 1 / 


determines whether both bits are set to one. Measurement with the operator M results in a state 
contained in one of the two subspaces .So and S\, where .S'i is the subspace spanned by {| 11}} and 
So is spanned by (|00), |01), 110}}. 

Measuring with M is quite different from measuring both qubits in the standard basis and then 
performing the classical and operation. For instance, the state | ) — 1 /\/2(|01} + 110}) remains 
unchanged when measured with M. but measuring both qubits of i//} would result in either the 
state |01} or 110}. 


Any Hermitian operator <2i <S> Qi on a two-qubit system is said to be composed of single-qubit 
measurements if Q \ and Qo are Hermitian operators on the single-qubit systems. Furthermore, any 
Hermitian operator of the form Q <g> I or 7 ® Q' on a two-qubit system is said to be a measurement 
on a single qubit of the system. More generally, a Hermitian operator of the form 

on an n -qubit system is said to be a single-qubit measurement of the system. Any Hermitian 
operator of the form A <g> I on a system V ® W. where A is a Hermitian operator acting on V is 
said to be a measurement of subsystem V. 

Section 5.1 shows that measurement operators in the standard basis, when combined with 
quantum state transformations, are sufficient to perform arbitrary quantum measurements. In par¬ 
ticular, there are quantum operations taking any basis to any other, so we can get all possible 
subspace decompositions of the state space by starting with a subspace decomposition in which 
all of the subspaces are generated by standard basis vectors and transforming. Understanding 
the effects of quantum measurement in different bases is crucial for a thorough understanding of 
entangled states and quantum information processing generally. Sections 2.4 and 3.4 illustrate the 
power of measuring in different bases as a key aspect of these quantum key distribution schemes. 
The next section turns to Bell’s theorem, which further illustrates this point while at the same 
time giving deeper insight into nonclassical properties of entangled states. 
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When talking about measurement of an n -qubit system, there are two totally distinct types of 
decompositions of the vector space V under consideration: the tensor product decomposition into 
the n separate qubits and the direct sum decomposition into k < 2" subspaces associated with 
the measuring device. These decompositions could not be more different. In particular, a tensor 
component V,- of V — Vi <8> • • ■ ® V„ is not a subspace of V. Similarly, the subspaces associated 
with measurements do not correspond to the subsystems, such as individual qubits, of the whole 
system. 

Section 2.3 mentioned that only one classical bit of information can be extracted from a single 
qubit. We can now both generalize this statement and make it more precise. Since any observable 
on an ;?-qubit system has at most 2 " distinct eigenvalues, there are at most 2 " possible results 
of a given measurement. Thus, a single measurement of an n -qubit system will reveal at most n 
bits of classical information. Since, in general, the measurement changes the state, any further 
measurements give information about the new state, not the original one. In particular, if the 
observable has 2" distinct eigenvalues, measurement sends the state to an eigenvector, and further 
measurement cannot extract any additional information about the original state. 

4.4 EPR Paradox and Bell's Theorem 

In 1935, Albert Einstein, Boris Podolsky, and Nathan Rosen wrote a paper entitled “Can quantum- 
mechanical description of physical reality be considered complete?”. The paper contained a 
thought experiment that inspired the simpler thought experiment, due to David Bohm, that we 
describe here. The experiment involves a pair of photons in the state A=(|00) + 111)). Pairs of 
particles in such a state are called EPR pairs in honor of Einstein, Podolsky, and Rosen, even 
though such states did not appear in their paper. 

Imagine a source that generates EPR pairs A=(|00> + 111}) and sends the first particle to Alice 
and the second to Bob. Alice and Bob can be arbitrarily far apart. Each person can perform mea¬ 
surements only on the particle he or she receives. More precisely, Alice can use only observables 
of the form O ® I to measure the system, and Bob can use only observables of the form I ® O', 
where O and O' are single-qubit observables. 



As we saw when we analyzed the Ekert91 quantum key distribution protocol in section 3.4, 
if Alice measures her particle in the standard single-qubit basis, and observes the state |0>, the 
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effect of this measurement is to project the state of the quantum system onto that part of the 
state compatible with the results of Alice’s measurement, so the combined state will now be 
|00). If Bob now measures his particle, he will always observe |0). Thus it appears that Alice’s 
measurement has affected the state of Bob’s particle. Similarly, if Alice measures |1), so will 
Bob. By symmetry, if Bob were to measure his qubit first, Alice would observe the same result as 
Bob. When measuring in the standard basis, Alice and Bob will always observe the same results, 
regardless of the relative timing. The probability that either qubit is measured to be |0> is 1 /2, but 
the two results are always correlated. 

If these particles are far enough apart and the measurements happen close in time (more 
specifically, if the measurements are relativistically spacelike separated), it may sound as if an 
interaction between these particles is happening faster than the speed of light. We said earlier that 
a measurement performed by Alice appears to affect the state of Bob’s particle, but this wording is 
misleading. Following special relativity, it is incorrect to think of one measurement happening first 
and causing the results of the other; it is possible to set up the EPR scenario so that one observer 
sees Alice measure first, then Bob, while another observer sees Bob measure first, then Alice. 
According to relativity, physics must explain equally well the observations of both observers. 
While the causal terminology we used cannot be compatible with both observers’ observations, 
the actual experimental values are invariant under change of observer; the experimental results 
can be explained equally well by Bob measuring first and then Alice as the other way around. 
This symmetry shows while there is correlation between the two particles, Alice and Bob cannot 
use their EPR pair to communicate faster than the speed of light. All that can be said is that Alice 
and Bob will observe correlated random behavior. 

Even though the results themselves are perfectly compatible with relativity theory, the behavior 
remains mysterious. If Alice and Bob had a large number of EPR pairs that they measure in 
sequence, they would see an odd mixture of correlated and probabilistic results: each of their se¬ 
quences of measurements appear completely random, but if Alice and Bob compare their results, 
they see that they witnessed the same random sequence from their two separate particles. Their 
sequence of entangled pairs behaves like a pair of magic coins that always land the same way up 
when tossed together, but whether they both land heads or both land tails is completely random. 
So far, quantum mechanics is not the only theory that can explain these results; they could also 
be explained by a classical theory that postulates that particles have an internal hidden state that 
determines the result of the measurement, and that this hidden state is identical in two particles 
generated at the same time by the EPR source, but varies randomly over time as the pairs are 
generated. According to such a classical theory, the reason we see random instead of deterministic 
results is simply because we, as of yet, have no way of accessing these hidden states. The hope of 
proponents of such theories was that eventually physics would advance to a stage in which this 
hidden state would be known to us. Such theories are known as local hidden-variable theories. 
The local part comes from the assumption that the hidden variables are internal to each of the 
particles and do not depend on external influences; in particular, the hidden variables do not 
depend on the state of faraway particles or measuring devices. 
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Is it possible to construct a local hidden-variable theory that agrees with all of the experimental 
results we use quantum mechanics to model? The answer is “no," but it was not until Bell’s work 
of 1964 that anyone realized that it was possible to construct experiments that could distinguish 
quantum mechanics from all local hidden-variable theories. Since then such experiments have 
been done, and all of the results have agreed with those predicted by quantum mechanics. Thus, 
no local hidden-variable theory whatsoever can explain how nature truly works. 

Bell showed that any local hidden variable theory predicts results that satisfy an inequality, 
known as Bell’s inequality. Section 4.4.1 presents the setup. Section 4.4.2 describes the results 
predicted by quantum theory. Section 4.4.3 establish Bell’s inequality for any local hidden variable 
theory in a special case. Section 4.4.4 gives Bell’s inequality in full generality. 


4.4.1 Setup for Bell's Theorem 

Imagine an EPR source that emits pairs of photons whose polarizations are in an entangled state 
|i jr) — 4^(|ft) + |->->», where we are using the notation |f) and | ->} for photon polarization 
of section 2.1.2. We suppose that the two photons travel in opposite directions, each toward a 
Polaroid (polarization filter). These polaroids can be set at three different angles. In the special 
case we consider first, the polaroids can be set to vertical, +60° off vertical, and —60° off vertical. 



4.4.2 What Quantum Mechanics Predicts 

Let Og be a single-qubit observable with 1-eigenspace generated by |u) = cos(9|0) + sin 0\ 1) 
and —1-eigenspace generated by |i> x ) = — sin 0|O) + cos0| 1). Quantum mechanics predicts 
that measurement of | i/r) with Og l ® 0$ 2 results in a state with eigenvalue 1 with probability 
cos 2 (ft| —@ 2 )• In other words, the probability that the state ends up in the subspace gener¬ 
ated by {|ui}|i> 2 ), |n] L }|t>^}}, and not the -1-eigenspace generated by |i)j L )|u 2 >}, is 

cos 2 (ft | — (h). Proving this fact is the subject of exercise 4.20. Here we describe its surprising 
nonclassical implications. 

The three different settings for each polaroid, —60°, vertical, and +60°, correspond to three 
observables, My<, M-j, and , each with two possible outcomes: either the photon passes 
through the polaroid, an outcome we will denote with P, or it is absorbed, an outcome we will 
denote with A. Using the fact that measurement with observable Og x <g> Ot h results in a state with 
eigenvalue 1 with probability cos 2 (6) — ff 2 ), we can compute the probability that measurement 
of two photons, by polaroids set at angles 9\ and 6+ give the same result, PP or AA. If both 
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polaroids are set at the same angle, then both photon measurements give the same results with 
probability cos 2 0=1: both photons will pass through the polaroids, or both will be absorbed. 
When the polaroid on the right is set to vertical, and the one on the left is set to +60°, both 
measurements agree with probability cos 2 60 = 1 /4. Unless the two polaroids are set at the same 
angle, the difference between the angles is either 60 or 120 degrees, so in all of these cases the 
two measurements agree 1 /4 of the time and disagree 3/4 of the time. 

If the polaroids are set randomly for a series of EPR pairs emanating from the source, then 

• with probability 1 /3 the polaroid orientation will be the same and the measurements will agree, 
and 

• with probability 2/3 the polaroid orientation will differ and the measurements will agree with 
probability 1/4. 

Thus, overall, the measurements will agree half the time and disagree half the time. When such 
an experiment is performed, these are indeed the probabilities that are seen. 

4.4.3 Special Case of Bell's Theorem: What Any Local Hidden Variable Theory Predicts 

This section shows that no local hidden-variable theory can give these probabilities. Suppose 
there is some hidden state associated with each photon that determines the result of measuring the 
photon with a polaroid in each of the three possible settings. We do not know the nature of such a 
state, but there are only 2 3 binary combinations in which these states can respond to measurement 
by polaroids in the 3 orientations. We label these 8 possibilities Aq, ..., hj. 



/ 
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We can think of h, as the equivalence class of all hidden states, however these might look, that give 
the indicated measurement results. Experimentally, it has been established that both polaroids, 
when set at the same angle, always give the same result when measuring the photons of an EPR 
pair |i//). For a local hidden-variable theory to have any chance of modeling experimental results, 
it must predict that both photons of the entangled pair be in the same equivalence class of hidden 
states hj. For example, if the photon on the right responds to the three polaroid positions /, f, \ 
with PAP, then so must the photon on the left. 

Now consider the 9 possible combinations of orientations of the two polaroids 
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{{/, /), {/, t). • • •. (\, \)} 

and the expected agreement of the measurements for photon pairs in each hidden state hj. Mea¬ 
surements on hidden states ho and hj ( {PPP, PPP} and {AAA, AAA}) agree for all possible 
pairs of orientations, giving 100 percent agreement. Measurements of the hidden state hi, {PPA, 
PPA}, agree in five of the nine possible orientations and disagree in the others. The other six cases 
are similar to hi, giving 5/9 agreement and 4/9 disagreement. No matter with what probability 
distribution the EPR source emits photons with hidden states, the expected agreement between 
the two measurements will be at least 5/9. Thus, no local hidden-variable theory can give the 
50-50 agreement predicted by quantum theory and seen in experiments. 

4.4.4 Bell's Inequality 

Bell’s inequality is an elegant generalization of the preceding argument. The more general setup 
also has a sequence of EPR pairs emanating from a photon source toward two polaroids, with 
three possible settings. We now consider polaroids that can be set at any triple of three distinct 
angles a, b, and c. 

If we record the results of repeated measurements at random settings of the polaroids, chosen 
from the settings above, we can count the number of times that the measurements match for any 
pair of settings. Let P xy denote the sum of the observed probability that either 

• the two photons interact in the same way with both polaroids (either both pass through, or both 
are absorbed) when the first polaroid is set at angle x and the second at angle y, or 

• the two photons interact in the same way with both polaroids when the first polaroid is set at 
angle y and the second at angle x. 

Since whenever the two polaroids are on the same setting, the measurement of the photons will 
always give the same result P xx — 1 for any setting x. We now show that the inequality, 

Pab + Pac + Pbc > 1, 

known as Bell’s inequality, holds for any local hidden-variable theory and any sequence of settings 
for each of the polaroids. 

We establish this inequality by showing that the inequality holds for the probabilities associated 
with any one equivalence class of hidden states, from which we deduce that it holds for any 
distribution of these equivalence classes. According to any local hidden-variable theory, the 
result of measuring a photon by a polaroid in each of the three possible settings is determined by 
a local hidden state h of the photon. Again, we think of h as an equivalence class of all hidden 
states that give the indicated measurement results. The fact that both polaroids, when set at the 
same angle, always give the same result when measuring the photons in an EPR state |i jr) means 
that both photons of the entangled pair must be in the same equivalence class of hidden states 
h. For example, if the photon on the right responds to the three polaroid positions a, b, c with 
PAP, then so must the photon on the left. Let P x be 1 if the result of the two measurements agree 
on states with hidden variable h, and 0 otherwise. Since any measurement has only two possible 
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results, P and A, simple logic tells us that the result of measuring a photon, with a given hidden 
state h, in each of the three polaroid settings, a, b, and c, will be the same for at least one of the 
settings. Thus, since the two photons of state |i j/) are in the same hidden state, for any h, 

Pa b +Pac + Pbc> 1- 

Let u>i, be the probability with which the source emits photons of kind h. Then the sum of 
the observed probabilities P a b + P ac + Ph c is a weighted sum, with weights Wh , of the results for 
photons of each hidden kind h : 

Pab + Pac + Pbc = J2 Wh (P ab + Pac + )• 

h 

The weighted average of numbers all greater than 1 is greater than 1, so since P^ h + P% c + Pj' c > 1 
for any li, we may conclude that 

Pab + Pac + Pbc > 1 - 

This inequality holds for any local hidden-variable theory and gives us a testable requirement. 

By exercise 4.20, quantum theory predicts that the probability that the two results will be the 
same is the square of the cosine of the angle between the two polaroid settings. If we take the an¬ 
gle between settings a and b to be 0 and the angle between settings b and c to be 0, then the 
inequality becomes 

cos 2 0 + cos 2 0 + cos 2 (0 + 0) > 1. 

For the special case of section4.4.3,quantum theory tells us thatforf? — <p — 60° each term is 1/4. 
Since 3/4 < 1, these probabilities violate Bell’s inequality, and therefore we can conclude that 
no local, deterministic theory can give the same predictions as quantum mechanics. Furthermore, 
experiments similar to but somewhat more sophisticated than the setup described here have been 
done, and their results confirm the prediction of quantum theory and nature’s violation of Bell-like 
inequalities. 

Bell’s theorem shows that it is not possible to model entangled states and their measurement 
with a local hidden-variable theory. Strictly speaking, entangled states should not be talked about 
in terms of local hidden states or cause and effect. But since there are some situations in which 
entanglement can be safely talked about in one or the other of these ways, and since both are more 
familiar than the sort of quantum correlation that actually exists, terminology suggesting either 
of these modes of thinking persists in the literature, 

4.5 References 

The original Einstein, Podolsky, Rosen paper [109] is worth reading for an account of their 
thinking. The first formulation of the paradox as we presented it here is due to Bohm [54], 

Our account of Bell’s inequalities is loosely based on Penrose’s excellent account [225] of a 
special case of Bell’s theorem for spin-1/2 particles. Greenstein and Zajonc [140] give a detailed 
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description, accessible to nonphysicists, of Bell’s theorem and the EPR paradox, experimental 
techniques for generating entangled photon pairs, and Aspect’s experiments testing for quantum 
violation of Bell’s inequalities. Detailed results of the experiments by Aspect et al. are published 
in [25, 26, 24], 

Stronger statements than the ones we presented can be made about the sorts of theories that 
Bell’s inequality rules out. The issues here can be relatively subtle. Mermin’s article [208] gives a 
readable account of some of these issues. Peres’s book [226] delves into these issues in detail. For 
a discussion of the various interpretations of quantum mechanics and their perceived strengths 
and weaknesses, see Sudbery’s book [267] and Bub’s book [71], 

4.6 Exercises 


Exercise 4.1 . Give the matrix, in the standard basis, for the following operators 

a. |0>(0|. 

b. |+}(0| — i| —><11■ 

c. |00)(00| + |01>(01|. 

d. |00)(00| + |01)(01| + |11>(01| + |10)(11|. 

e. |4'+)('I'+| where |'P+> = -^(|00) + |11». 


Exercise 4.2. Write the following operators in bra/ket notation 

l 

a. The Hadamard operator H = 




/ 23 0 0 0 \ 

0-500 

e ' 0 0 0 0 ' 

0 0 0 9 / 

f. A® A. 

g. A® Z. 

h. H®H. 

i. The projection operators P\ : V -» Si and Pi '■ V -> .S' 2 , where 5j is spanned by 
{|+>l+>, !->!->} and S 2 is spanned by [|+)|->, |->|+}}. 
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Exercise 4.3. 
Exercise 4.4. 
Exercise 4.5. 
Exercise 4.6. 


Show that any projection operator is its own adjoint. 

Rewrite example 3.3.2 on page 42 in terms of projection operators. 
Rewrite example 3.3.3 on page 42 in terms of projection operators. 
Rewrite example 3.3.4 on page 43 in terms of projection operators. 


Exercise 4.7. Using the projection operator formalism 

a. compute the probability of each of the possible outcomes of measuring the first qubit of an 
arbitrary two-qubit state in the Hadamard basis {|+>, |—}}. 

b. compute the probability of each outcome for such a measurement on the state | xl/ + ) = 

^(| 00 ) + 111 ). 

c. for each possible outcome in (b), describe the possible outcomes if we now measure the second 
qubit in the standard basis. 

d. for each possible outcome in (b), describe the possible outcomes if we now measure the second 
qubit in the Hadamard basis. 


Exercise 4.8. Show that (A|jc)) + = (x\A^. 


Exercise 4.9. Design a measurement on a three-qubit system that distinguishes between states in 
which all bit values are equal and those in which they are not, and gives no other information. 
Write all operators in bra/ket notation. 

Exercise 4.10. Design a measurement on a three-qubit system that distinguishes between states 
in which the number of 1 bits is even, and those in which the number of 1 bits is odd, and gives 
no other information. Write all operators in bra/ket notation. 

Exercise 4.11. Design a measurement on a three-qubit system that distinguishes between states 
with different numbers of 1 bits and gives no other information. Write all operators in bra/ket 
notation. 


Exercise 4.12. Suppose O is a measurement operator corresponding to a subspace decomposition 
V = ,Sj © W © .S';? © .S '4 with projection operators Pi, P 2 , P 3 , and P 4 . Design a measurement 
operator for the subspace decomposition V = S 5 ® S(, , where S 5 — Sj © S 2 and — S 3 ® S 4 . 

Exercise 4.13. 

a. Let O be any observable specifying a measurement of an n -qubit system. Suppose that after 
measuring |t/r> according to O, we obtain \<p). Show that if we now measure \<p) according to O , 
we simply obtain \<p) again, with certainty. 

b. Reconcile the result of (a) with the fact that for most observables O it is not true that O 2 = O. 
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Exercise 4.14. 

a. Give the outcomes and their probabilities for measurement of each of the standard basis 
elements with respect to the Bell decomposition of example 4.2.6. 

b. Give the outcomes and their probabilities for measurement of a general two-qubit state |i jf) — 
«oo|00) + not |01) + fliol 10} + flu 111} with respect to the Bell decomposition. 

Exercise 4.15. 

a. Show that the operator B of example 4.3.4 is of the form Q® I, where Q is a (2 x 2)-Hermitian 
operator. 

b. Show that any operator of the form Q® I, where Q is a (2 x 2)-Hermitian operator and I is the 
(2 x 2)-identity operator, specifies a measurement of a two-qubit system. Describe the subspace 
decomposition associated with such an operator. 

c. Describe the subspace decomposition associated with an operator of the form I ® Q where 
Q is a (2 x 2)-Hermitian operator and I is the (2 x 2)-identity operator, and give a high-level 
description of such measurements. 

Exercise 4.16. This exercise shows that for any Hermitian operator O : V —> V, the direct sum 
of all eigenspaces of O is V. 

A unitary operator U satisfies U' U — I. 

a. Show that the columns of a unitary matrix U form an orthonormal set. 

b. Show that if O is Hermitian, then so is UOU~ x for any unitary operator U. 

c. Show that any operator has at least one eigenvalue X and 1-eigenvector v-,. 

d. Use the result of (c) to show that for any matrix A : V -> V , there is a unitary operator U such 
that the matrix for U AU~ l is upper triangular (meaning all entries below the diagonal are zero). 

e. Show that for any Hermitian operator O : V —> V with eigenvalues X i,..., a* . the direct sum 
of the Xi -eigenspaces Sx j gives the whole space: 

V = S Xl ®S l2 ®---®S Xk . 


Exercise 4.17. 

a. Show that any state resulting from measuring an unentangled state with a single-qubit 
measurement is still unentangled. 

b. Can other types of measurement produce an entangled state from an unentangled one? If so, 
give an example. If not, give a proof. 

c. Can an unentangled state be obtained by measuring a single qubit of an entangled state? 

Exercise 4.18. Show that if there is no measurement of one of the qubits that gives a single result 
with certainty, then the two qubits are entangled. 
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Exercise 4.19. Give an explicit description of the observable Og of section 4.4.2 in both bra/ket 
and matrix notation. 

Exercise 4.20. Let Og 1 be the single-qubit observable with +1-eigenvector 
|ui> = cos0i|O> + sin 11) 
and —1-eigenvector 

lu/ - } = — sini 0|O) + cos0i|l). 

Similarly, let O fh be the single-qubit observable with +1-eigenvector 

\v 2 ) — cos 0 2 10} + sin 0211 > 

and —1-eigenvector 

\v 2 ) — - sin 0 2 10) +cos0 2 |l>. 

Let O be the two-qubit observable Og 1 0 Og 2 . We consider various measurements on the EPR state 
| i/r) — -^(|00) + 111}). We are interested in the probability that the measurements Og 1 0 I and 
I 0 Og 2 , if they were performed on the state i//}, would agree on the two qubits in that either both 
qubits are measured in the 1-eigenspace or both are measured in — 1-eigenspace of their respective 
single-qubit observables. As in example 4.2.5, we are not interested in the specific outcome of the 
two measurements, just whether or not they would agree. The observable O = 0,, l 0 Og 2 gives 
exactly this information. 

a. Find the probability that the measurements Og l 0/ and 7 0 Og 2 , when performed on \!/), 
would agree in the sense of both resulting in a + 1 eigenvector or both resulting in a —1 eigenvec¬ 
tor. (Hint: Use the trigonometric identities cos(0i — 0 2 ) = cos(0i) cos(02) + sin(0i) sin(02) and 
sin(0i — 0 2 ) = sin(0i) cos( 02 ) — cos(0i) sin( 02 ) to obtain a simple form for your answer.) 

b. For what values of 0i and 0 2 do the results always agree? 

c. For what values of 0i and 0 2 do the results never agree? 

d. For what values of 0i and 0 2 do the results agree half the time? 

e. Show that whenever 6 \ ^ d 2 and 0\ and 0 2 are chosen from {—60°, 0°, 60°}, then the results 

agree 1 /4 of the time and disagree 3/4 of the time. 

Exercise 4.21. 

a. Most of the time the effect of performing two measurements, one right after the other, cannot 
be achieved by a single measurement. Find a sequence of two measurements whose effect cannot 
be achieved by a single measurement, and explain why this property is generally true for most 
pairs of measurements. 

b. Describe a sequence of two distinct nontrivial measurements that can be achieved by a single 
measurement. 
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c. For each of the measurements specified by the operators A, B, C, and M from examples 4.3.3, 
4.3.4,4.3.5, and 4.3.6, say whether the measurement can be achieved as a sequence of single-qubit 
measurements. 

d. How does performing the sequence of measurements Z <g> I followed by / <g> Z compare with 
performing the single measurement Z <g> Z? 

Exercise 4.22. Show that no matter in which basis the first qubit of an EPR pair ^(|00) + 111)) 
is measured, the two possible outcomes have equal probability. 
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Quantum State Transformations 


The last two chapters discussed encoding information in quantum states and some of the uniquely 
quantum properties of such quantum information, such as entangled states, the exponential state 
space, and quantum measurement. This chapter develops the basic mechanisms for computing on 
quantum information. Computation on quantum information takes place through dynamic trans¬ 
formation of quantum systems. In order to understand quantum computation, we must understand 
which sorts of transformations nature allows and which it does not. This chapter focuses on trans¬ 
formations of a closed quantum system, transformations that map the state space of the quantum 
system to itself. Measurement is not a transformation in this sense. Chapter 10 discusses more 
general transformations, transformations of a subsystem that is part of a larger quantum system. 

This chapter begins with a brief discussion of transformations on general quantum systems, 
and it then focuses on multiple-qubit systems. Section 5.1 discusses the unitarity requirement on 
quantum state transformations and the no-cloning principle. The no-cloning restriction is central 
to both the limitations and the advantages of encoding information in quantum states; for example, 
it underlies the security of quantum cryptographic protocols such as the ones described in sections 
2.4 and 3.4, and it is also vital to the argument of section 4.3.1 that no more than n classical bits 
worth of information can be extracted from an //-qubit system. 

After discussing considerations for transformations of general quantum systems, the chapter 
restricts discussion to //-qubit systems and develops building blocks for the standard circuit model 
of quantum computation. Part II uses this model to describe quantum algorithms. All quantum 
transformations on n -qubit quantum systems can be expressed as a sequence of transformations on 
single-qubit and two-qubit subsystems. Some quantum state transformations can be implemented 
in terms of these basic gates more easily than others. The efficiency of a quantum transform is 
quantified in terms of the number of one- and two-qubit gates used. Section 5.2 looks at single¬ 
qubit and two-qubit transformations, ways of combining them, and a graphical notation for 
describing sequences of transformations. Section 5.3 describes applications of these simple gates 
to two communication problems: dense coding and quantum state teleportation. Section 5.4 is 
devoted to showing that any quantum transformation can be realized as a sequence of one- and 
two-qubit transformations. Section 5.5 discusses finite sets of gates that can be used to approximate 
all quantum transformations universally. The chapter concludes with a definition of the standard 
circuit model for quantum computation. 
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5.1 Unitary Transformations 

In this book, quantum transformation will mean a mapping from the state space of a quantum 
system to itself. Measurements are not quantum transformations in this sense; there are only 
finitely many outcomes, and the result of applying a measurement to a specific state is only 
probabilistic. Chapter 10 considers open quantum systems, systems that are subsystems of a larger 
quantum system, and studies the transformations of subsystems induced by transformations of the 
larger system. In this chapter, we concern ourselves only with transformations of closed quantum 
systems. 

Nature does not allow arbitrary transformations of a quantum system. Nature forces these trans¬ 
formations to respect properties connected to quantum measurement and quantum superposition. 
The transformations must be linear transformations of the vector space associated with the state 
space so that a state that is a superposition of other states goes to the superposition of their images; 
more precisely, linearity means that for any quantum transformation U, 

U (at |Vq} H- \~a k \f k )) = a l U\ifr l )-\ - \-a k U\it k ) 

on any superposition |i/r} = «iIV^i) + • —Unit length vectors must go to unit length 
vectors, which implies that orthogonal subspaces go to orthogonal subspaces. These properties 
ensure that measuring and then applying a transform to the outcome gives the same result as first 
applying the transform and then measuring in the transformed basis. Specifically, the probability 
of obtaining outcome U\(p) by first applying U to \\{/) and then measuring with respect to the 
decomposition ©{/Sj is the same as the probability of obtaining U\(p) by measuring |i //) with 
respect to the decomposition ©S, and then applying U. These properties hold if U preserves the 
inner product; for any |i fr) and |</>), the inner product of their images, U\\l/) and U\<p), must be 
the same as the inner product between \ifr) and \<p)\ 

{(f>\U ] 'U\is) = {(p\f). 

A straightforward mathematical argument shows that this condition holds for all |i [r) and 
| <p) only if U 1 iJ = I. In other words, for any quantum transformation U, its adjoint U T 
must be equal to its inverse, precisely the condition, U f = U~ 1 , for a linear transformation 
to be unitary. Furthermore, this condition is sufficient; the set of allowed transformations of 
a quantum system corresponds exactly to the set of unitary operators on the complex vector 
space associated with the state space of the quantum system. Since unitary operators preserve 
the inner product, they map orthonormal bases to orthonormal bases. In fact, the converse 
is true: any linear transformation that maps an orthonormal basis to an orthonormal basis is 
unitary. 

Geometrically, all quantum state transformations are rotations of the complex vector space 
associated with the quantum state space. The /th column of the matrix is the image U\i) of the 
i th basis vector, so for a unitary transformation given in matrix form, U is unitary if and only if 
the set of columns of its matrix representation are orthonormal. Since U f is unitary if and only 



5.1 Unitary Transformations 


73 


if U is, it follows that U is unitary if and only if its rows are orthonormal. The product JJ\ U 2 of two 
unitary transformations is again unitary. The tensor product U\ ® U 2 is a unitary transformation 
of the space X\ ® Xi if U\ and U 2 are unitary transformations of X \ and Xi respectively. Linear 
combinations of unitary operators, however, are not in general unitary. 

The unitarity condition simply ensures that the operator does not violate any general principles 
of quantum theory. It does not imply that a transformation can be implemented efficiently; most 
unitary operators cannot be efficiently implemented, even approximately. In later chapters, par¬ 
ticularly when we examine quantum algorithms, we will concern ourselves with questions about 
the efficiency of certain quantum transformations. 

An obvious consequence of the unitary condition is that every quantum state transformation 
is reversible. Chapter 6 describes work of Charles Bennett, Edward Fredkin, and Tommaso 
Toffoli, done prior to the development of quantum information processing, that shows that all 
classical computations can be made reversible with only a negligible loss of efficiency. Thus, the 
reversibility requirement does not impose an unworkably strict restriction on quantum algorithms. 

In the standard circuit model of quantum computation, all computation is carried out by quantum 
transformations, with measurement used only at the end to read out the results. Since measurement 
can effect changes in quantum states, the dynamics of measurement, rather than quantum state 
transformations, provide an alternative means to achieve computation. Section 13.4 describes an 
alternate, but equally powerful, model of quantum computation in which all computation takes 
place by measurement. 

The phrases quantum transformation or quantum operator refer to unitary operators acting on 
the state space, not measurement operators. While measurements are modeled by operators, the 
behavior of measurement is not modeled by the direct action of the measurement’s Hermitian 
operator on the state space, but rather by the indirect, probabilistic procedure described by the 
measurement postulate of section 4.3.1. One of the least satisfactory aspects of quantum theory 
is that there are two distinct classes of manipulations of quantum states: quantum transforma¬ 
tions and measurement. Section 10.3 describes a tighter, but still unsatisfactory, relation between 
the two. 


5.1.1 Impossible Transformations: The No-Cloning Principle 

This section describes a simple, but important, consequence of the unitary condition: unknown 
quantum states cannot be copied or cloned. In fact, the linearity of unitary transformations alone 
implies the result. Suppose U is a unitary transformation that clones , in that U(\a) |0>) = \a)\a) 
for all quantum states | a). Let | a) and | b) be two orthogonal quantum states. That U clones means 
U(\a) |0» — |a)|a) and t/(|&)|0)) = \b)\b). Consider |c} = 4j(|a) + \b)). By linearity, 


U (|c}|0» = 


1 


(U (|a}|0» + U (|&}|0») 


1 

V5 


(l a >l fl > + \b)\b)). 
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But if JJ is a cloning transformation then 
U (|c}| 0 » = |c)|c) = l/2(\a)\a) + \a)\b) + \b)\a) + \b)\b)), 

which is not equal to ( 1 /V 2 )(|a)|a) + \b)\b)). Thus, there is no unitary operation that can reliably 
clone all quantum states. 

The no-cloning theorem tells us that it is impossible to clone a specific unknown quantum state 
reliably. It does not preclude the construction of a known quantum state from a known quantum 
state. It is possible to perform an operation that appears to be copying the state in one basis but 
does not do so in others. For example, it is possible obtain n particles in an entangled state 
a |00 ... 0 } + b 111 ... 1 > from an unknown state a | 0 } + b\ 1 ). But it is not possible to create the n 
particle state (fl| 0 ) + b\ 1 )) ® • • • ® (fl| 0 > + b\ 1>) from an unknown state u| 0 } +b 11 ). 

5.2 Some Simple Quantum Gates 

Just as for classical computation, it is a boon to quantum computation, both for implementation and 
analysis, that arbitrarily complex computations can be achieved by composing simple elements. 
Section 5.4 shows that any quantum state transformation on an u-qubit system can be realized 
using a sequence of one- and two-qubit quantum state transformations. We will call any quantum 
state transformation that acts on only a small number of qubits a quantum gate. Sequences of quan¬ 
tum gates are called quantum gate arrays or quantum circuits. 

In the quantum-information-processing literature, gates are mathematical abstractions useful 
for describing quantum algorithms; quantum gates do not necessarily correspond to physical 
objects, as they do in the classical case. So the gate terminology and its accompanying graphical 
notation must not be taken too literally. For solid state or optical implementations, there may be 
actual physical gates, but in NMR and ion trap implementations, the qubits are stationary particles, 
and the gates are operations on these particles using magnetic fields or laser pulses. For these 
implementations, gates operate on a physical register of qubits. 

From a practical point of view, the standard description of computation in terms of one- and 
two-qubit gates leaves something to be desired. Ideally, we would write all our computations in 
terms of gates that are easy to implement physically and are robust, but we do not yet know 
which ones these are. Furthermore, in order to realize physically a quantum computer capable of 
performing arbitrary quantum transformations, it would be convenient to have only finitely many 
gates that could generate all unitary transformations. Unfortunately, such a set is impossible; there 
are uncountably many quantum transformations, and a finite set of generators can only generate 
countably many elements. Section 5.5 shows that it is possible, however, for finite sets of gates 
to generate arbitrarily close approximations to all unitary transformations. A number of such sets 
are known, but it is unclear which of these will be most practical from a physical implementation 
point of view. For analyzing quantum algorithms, it is useful to have a standard set of gates with 
which to analyze the efficiency of quantum algorithms. The set we use includes all one-qubit 
gates together with the two-qubit gate described in section 5.2.4. 
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Figure 5.1 

A sample graphical representation for a three-qubit quantum gate array. Data flow left to right through the circuit. 


Graphical notation, representing series of quantum state transformations acting on various com¬ 
binations of qubits, is commonly used to describe sequences of transformations and to analyze the 
resulting algorithms. Simple transformations are graphically represented by appropriately labeled 
boxes which are connected to form more complex circuits. A sample graphical representation is 
shown in figure 5.1. Each horizontal line corresponds to a qubit. The transformations on the left 
are performed first, and the processing proceeds from left to right. The boxes labeled with Uq, 
U\ , and C /3 correspond to single-qubit transformations, while the one labeled U 2 corresponds to 
a two-qubit transformation. When we talk about applying an operator U to qubit i of an n -qubit 
quantum system, we mean that we apply the operator I ® ® / ®C/ ® I ® • • • ® I to the entire 

system, where I is the single-qubit identity operator, applied to each of the other qubits of the 
system. 

The remainder of this section describes a variety of frequently used quantum gates. 

5.2.1 The Pauli Transformations 

The Pauli transformations are the most commonly used single-qubit transformations: 

I : | 0 >( 0 | + | 1 >( 1 | 

| 1 )( 0 | + | 0 >( 1 | 

Y: — |1)(0| + |0> (11 

Z: |0)(0M1>(H 

where / is the identity transformation, X is negation (the classical not operation on |0} and |1) 
viewed as classical bits), Z changes the relative phase of a superposition in the standard basis, 
and Y — ZX is a combination of negation and phase change. In graphical notation, these gates 
are represented by boxes 
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labeled appropriately. 

There is variation in the literature as to which transformations are the Pauli transformations, and 
in the notation used. The main discrepancy is whether — i (|0> {1| — 11) (011) is considered the Pauli 
transformation instead of Y — |0) (11 — 11) (0|, as we do here. The operator iF is Hermitian, which 
is a useful property in some settings, for example, if we wanted to use it to describe measurement. 
Also, sometimes the notation a x , a y , and a z is used instead. Throughout this book, we use /, 
X, F, and Z for the Pauli operators representing single-qubit transformations. In chapter 10, we 
use the notation cs x = X, er v = —iF, and a z = Z when the Pauli operators are used to describe 
quantum states. 


5.2.2 The Hadamard Transformation 

Another important single-qubit transformation is the Hadamard transformation 


H — —t=(|0)(0| + |1)(0| + 10) (11 — 11> <11), 
\/2 


or 

H: 10} -* |+> = -L(|0> + |l» 

I!) -> |-> = ^(|0>-|1», 


which produces an even superposition of |0> and 11} from either of the standard basis elements. 
Note HH — I. In the standard basis, the matrix for the Hadamard transformation is 



5.2.3 Multiple-Qubit Transformations from Single-Qubit Transformations 

Multiple-qubit transformations can be constructed as tensor products of single-qubit transforma¬ 
tions. These transformations are uninteresting as multiple-qubit transformations in the sense that 
they are equivalent to performing the single-qubit transformations on each of the qubits separately 
in some order. For example, U <g> V can be obtained by first applying U <g> / and then I <g> V. 

More interesting are those multiple-qubit transformations that can change the entanglement 
between qubits of the system. Entanglement is not a local property in the sense that transformations 
that act separately on two or more subsystems cannot affect the entanglement between those 
subsystems. More precisely, let \ij/) be a two-qubit state and U and V be single-qubit unitary 
transformations. Then (U is entangled if and only if \xjj-) is. The widely used class of 

two-qubit controlled gates discussed in the next section illustrates the effects transformations can 
have on entanglement. 
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5.2.4 The Controlled-NOT and Other Singly Controlled Gates 

The controlled-NOT gate, C not , acts on the standard basis for a two-qubit system, with |0) and 
|1) viewed as classical bits, as follows: it flips the second bit if the first bit is 1 and leaves it 
unchanged otherwise. The C, io/ transformation has representation 

C not = |0>(0|®/+|1>(1|®X 

= | 0 >( 0 |®(| 0 >( 0 | + | 1 >( 1 |) + | 1 >( 1 |®(| 1 >( 0 | + | 0 )< 1 |) 

= | 00 }( 00 | + | 01 )< 01 | + 111 >( 10 | + | 10 )( 11 |, 

from which it is easy to read off its effect on the standard basis elements: 

C not : |00) |00> 

101 ) -> | 01 ) 

110 ) -► 111 ) 

111 ) -* 110 ). 


The matrix representation (in the standard basis) for C no t is 

/ 1 0 0 0 \ 

0 10 0 

0 0 0 1 ' 

\ 0 0 1 0 / 


Observe that C not is unitary and is its own inverse. Furthermore, the C not gate cannot be 
decomposed into a tensor product of two single-qubit transformations. 

The importance of the C„ ot gate for quantum computation stems from its ability to change 
the entanglement between two qubits. For example, it takes the unentangled two-qubit state 
■4j(|0) + |1))|0) to the entangled state -4^(|00) + 111)): 


r 

y ~ y n 


V 2 


(|0) + |1»®|0> ) = c , 


V2 


(|00> + |10» 


1 

7 ! 


(|00) + |11)). 


Similarly, since it is its own inverse, it can take an entangled state to an unentangled one. 
The controlled-NOT gate is so common that it has its own graphical notation. 


-e- 

The open circle indicates the control bit, the x indicates negation of the target bit, and the line 
between them indicates that the negation is conditional, depending on the value of the control 
bit. Some authors use a solid circle to indicate negative control, in which the target bit is toggled 
when the control bit is 0 instead of 1. 
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A useful class of two-qubit controlled gates, which generalizes the C no , gate, consists of gates 
that perform a single-qubit transformation Q on the second qubit when the first qubit is 11} and 
do nothing when it is |0). These controlled gates have graphical representation 



We use the following shorthand for these transformations: 

/\<2 = | O >( O |®/+| 1 >< 1 |® 0 . 

The transformation C not , for example, becomes f \ X in this notation. In the standard compu¬ 
tational basis, the two-qubit operator f\Q is represented by the 4x4 matrix 

C S) 

Let us look in more depth at one of these controlled gates, the controlled phase shift /\ e '°, 
where e' e is shorthand for e' e I. In the standard basis, the controlled phase shift changes the phase 
of the second bit if and only if the control bit is one: 

f\e ie = |00>(00| + |01}(01| + e i0 |lO)(lO| + e i0 |ll)(ll|. 

Its effect on the standard basis elements is as follows: 


/\ e ie : |00) -> 100) 

101 ) -» | 01 > 

110) -* e i0 |lO) 

111) -> e i0 |ll) 

and it has matrix representation 

/ 1 0 0 0 \ 

0 10 0 
0 0 e i0 0 

\ 0 0 0 e ie ) 


The controlled phase shift makes use of a single-qubit transformation that was a physically 
meaningless global phase shift when applied to a single-qubit system, but when used as part of 
a conditional transformation, this phase shift becomes nontrivial, changing the relative phase 
between elements of a superposition. For example, it takes 


1 

7 ^ 


( 100 ) + 111 )) 


1 

75 


(|00> + e i0 |ll>). 


Graphical icons can be combined into quantum circuits. The following circuit, for instance, 
swaps the value of the two bits. 
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In other words, this swap circuit takes 

|00 } h* 100 ) 

|01> h* 110) 

|10> |01> 

|11> |11), 

and \fi)\(p) i-^ for all single-qubit states | \[r) and \(p). 

Three cautions are in order. The hrst concerns the use of a basis to specify the transformation. 
The second concerns the basis dependence of the notion of control. The third suggests care in 
interpreting the graphical notation for quantum circuits. 

Caution 1: Phases in Specifications of Transformations Section 3.1.3 discussed the important 
distinction between the quantum state space (projective space) and the associated complex vector 
space. We need to keep this distinction in mind when interpreting the standard ways quantum state 
transformations are specified. A unitary transformation on the complex vector space is completely 
determined by its action on a basis. The unitary transformation is not completely determined by 
specifying what states the states corresponding to basis states are sent to, a subtle distinction. For 
example, the controlled phase shift takes the four quantum states represented by 100), |01), 110), 
and 111} to themselves; 110) and e'" | 10) represent exactly the same quantum state, and so do 111) 
and e ie |ll). As we saw above, however, this transformation is not the identity transformation 
since it takes ' (100} + 111}) to 4| (|00} + e' 6 | 10}). To avoid mistakes, remember that notation 
such as 

100 } 100) 

| 01 > -> | 01 > 

110) -* e ie |10> 

|11> e ie |ll> 

is used to specify a unitary transformation on the complex vector space in terms of vectors in 
that vectors space, not in terms of the states corresponding to these vectors. Specifying that the 
vector |0) goes to the vector — 11} is different from specifying that |0) goes to 11) because the two 
vectors — 11) and 11) are different vectors even if they correspond to the same state. The quantum 
transformation on the state space is easily derived from the unitary transformation on the associated 
complex vector space. 

Caution 2: Basis Dependence of the Notion of Control The notion of the control bit and the target 
bit is a carryover from the classical gate and should not be taken too literally. In the standard basis, 
the C„ 0 , operator behaves exactly as the classical gate does on classical bits. However, one should 
not conclude that the control bit is never changed. When the input qubits are not one of the 
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standard basis elements, the effect of the controlled gate can be somewhat counterintuitive. For 
example, consider the C, wt gate in the Hadamard basis {|+>, |—>}: 

Cnot '■ 1 ++} “► 1 ++} 

1+-} -> I—> 

1-+} -* l-+> 

I—} l+->. 

In the Hadamard basis, it is the state of the second qubit that remains unchanged, and the state 
of the first qubit that is flipped depending on the state of the second bit. Thus, in this basis the 
sense of which bit is the control bit and which the target bit has been reversed. But we have 
not changed the transformation at all, only the way we are thinking about it. Furthermore, in 
most bases, we do not see a control bit or a target bit at all. For example, as we have seen, the 
controlled-NOT transforms -^(|0} + |1>)|0> to -4= (|00) + 111)). In this case the controlled-NOT 
entangles the qubits so that it is not possible to talk about their states separately. 

A related fact, which we will use in constructing algorithms and in quantum error correction, 
is that the following two circuits are equivalent: 


Caution 3: Reading circuit diagrams The graphical representation of quantum circuits can be 
misleading if one is not careful to interpret it properly. In particular, one cannot determine the 
effect the transformation has on the input qubits, even if they are all in standard basis states, by 
simply looking at the line in the diagram corresponding to that qubit. Let us look at the circuit 



acting on the input state |0}|0}. Since the Hadamard transformation is its own inverse, it might at 
first appear that the first qubit’s state would remain unchanged by the transformation. But it does 
not. Recall from caution 2 that the controlled-NOT gate does not leave the first qubit unaffected in 
general. In fact, this circuit takes the input state 100} to 1 /2(|00) + 110} + |01> — 111», an effect 
that cannot be seen immediately from the circuit and so must be explicitly calculated. 

5.3 Applications of Simple Gates 

For many years, EPR pairs, and entanglement more generally, were viewed as quantum mechan¬ 
ical oddities of merely theoretical interest. Quantum information processing changes that per¬ 
ception by providing practical applications of entanglement. Two communications applications. 
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dense coding and teleportation, illustrate the usefulness of EPR pairs when used together with a 
few simple quantum gates. 

Dense coding uses one quantum bit together with a shared EPR pair to encode and transmit 
two classical bits. Since EPR pairs can be distributed ahead of time, only one qubit needs to be 
physically transmitted to communicate two bits of information. This result is surprising, since, as 
section 2.3 explained, only one classical bit’s worth of information can be extracted from a qubit. 
Teleportation is the opposite of dense coding in that it uses two classical bits to transmit the state 
of a single qubit. Teleportation is surprising in two respects. In spite of the no-cloning principle 
of quantum mechanics, there exists a mechanism for the transmission of an unknown quantum 
state. Also, teleportation shows that two classical bits suffice to communicate a qubit state that 
can be in any one of an infinite number of possible states. 

The key to both dense coding and teleportation is the use of entangled particles. The initial 
setup is the same for both processes. Alice and Bob wish to communicate. Each is sent one of the 
entangled particles making up an EPR pair 


l^o> 


I 

V! 


(| 00 > + | 11 ». 


Suppose Alice is sent the first particle, and Bob the second: 


l^o> = 


1 


(|0a>| 0 b > + |1 a >| 1 s ». 


Alice can perform transformations only on her particle, and Bob can perform transformations 
only on his, until Alice sends Bob her particle or vice versa. In other words, until a particle is 
transmitted between them, Alice can perform transformations only of the form Q ® I on the EPR 
pair, where Q is a single-qubit transformation, and Bob transformations only of the form I <g> Q. 
More generally, for K — 2 k , let I (K) be the 2 k x 2 k identity matrix. If Alice has n qubits and Bob 
has m qubits, then Alice can perform transformations only of the form U <E> I <M \ where U is an 
n-qubit transformation, and Bob can perform transformations only of the form l tN> <g> U. 


5.3.1 Dense Coding 



EPR 

source 
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Al ice Alice wishes to transmit the state of two classical bits encoding one of the numbers 0 through 
3. Depending on this number, Alice performs one of the Pauli transformations {/, A, Y. Z} on her 
qubit of the entangled pair | xj/o). The resulting state is shown in the following table. 

Value Transformation New state 

0 Wo) = ^(|00> + |11» 

1 |Vq> = (A ®/)|tfr 0 > ^(|10> + |01» 

2 \f 2 ) = (Z®IMo) ^(|00}-|11» 

3 \fo) = (Y®I)\ir 0 ) ^(-|10> + |01» 

Alice then sends her qubit to Bob. 

Bob To decode the information, Bob applies a controlled-NOT to the two qubits of the entangled 
pair and then applies the Hadamard transformation H to the first qubit: 

7j(|00> + |ll» -L(|00> + |10» 

7l(l 10 > + |01» Cnof ^(111)+ 101)) 

^(| 00 )-| 11 » —► ^(| 00 >-| 10 » 

^(-| 10 ) + | 01 » ^(-| 11 ) + | 01 » 

^(| 0 > + | 1 »®| 0 > 

^(| 1 > + | 0 »®| 1 > 

^(| 0 >-| 1 »®| 0 > 

^(-|1) + |0 ))®| 1 > 

' 100 ) 

H®1 |01) 

—> 110 ) 

111 ). 

Bob then measures the two qubits in the standard basis to obtain the two-bit binary encoding 
of the number Alice wished to send. 

5.3.2 Quantum Teleportation 

The objective of teleportation is to transmit enough information, using only classical bits, about 
the quantum state of a particle that a receiver can reconstruct the exact quantum state. Since the 
no-cloning principle of quantum mechanics means that a quantum state cannot be copied, the 
quantum state of the original particle cannot be preserved. It is this property—that the original 
state at the source must be destroyed in the course of creating the state at the target—that gives 
quantum teleportation its name. 
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Alice Alice has a qubit whose state \<p) = a|0) + b\ 1} she does not know. She wants to send this 
state to Bob through classical channels. As in the setup for the dense coding application, Alice 
and Bob each possess one qubit of an entangled pair 

|* 0 )= 4 =(| 00 > + | 11 ». 

V2 

The starting state is the three-qubit quantum state 

\4>) ® Wo> = 2=(a|0> ® (|00> + 111)) + b\ 1> ® (|00) + 111})) 

= -^= (a 1000) +a|011) +b\100) + b|lll>). 
v 2 

Alice controls the first two qubits and Bob controls the last one, 

Alice applies the decoding step used by Bob in the dense coding scenario to the combined state 
of the qubit \<j>) to be transmitted and her half of the entangled pair. In other words, Alice now 
applies C, wt <E) I followed by H ® I <g> I to this state to obtain 

(H <8> I ® I)(C no , <g> I)(\<j>) ® |iAo» 


)/) — (fl|000}+fl|011>+h|110>-)-h|101}) 

V 2 


= ^(fl(|000> + |011> + I100) + |111» + /7(|010) + 1001} - 1110} - |101») 


= i(|00>(fl|0> + &|l)) + |01>(a|l> + 6|0» + |10)(fl|0>-&|l» + |ll>(a|l>-*|0})). 

Alice measures the first two qubits and obtains one of the four standard basis states |00), |01), 

110), and 111} with equal probability. Depending on the result of her measurement, the quantum 
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state of Bob’s qubit is projected to a|0> + b\ 1), a\ 1) + b|0), a|0> — b\ 1>, or a\ 1) — b|0). Alice 
sends the result of her measurement as two classical bits to Bob. 

After these transformations, crucial information about the original state \<f>) is contained in 
Bob’s qubit. There is now nothing Alice can do on her own to reconstruct the original state of her 
qubit. In fact, the no-cloning principle implies that at any given time, only one of Alice or Bob 
can reconstruct the original quantum state. 

Bob When Bob receives the two classical bits from Alice, he knows how the state of his half 
of the entangled pair compares to the original state of Alice’s qubit. Bob can reconstruct the 
original state of Alice’s qubit, | <f>), by applying the appropriate decoding transformation to his 
qubit, originally part of the entangled pair. The following table shows the state of Bob’s qubit 
before the decoding has taken place and the decoding operator Bob should use depending on the 
value of the bits he received from Alice. 

State Bits received Decoding 


a|0) + b\ 1) 00 / 

a\\) + b|0) 01 X 

a|0) — b\\) 10 Z 

a 11) — b|0) 11 Y 


After decoding, Bob’s qubit will be in the quantum state, a|0) + b\ 1}, in which Alice’s qubit started. 
This decoding step is the encoding step of dense coding, and the encoding step was the decoding 
step of dense coding, so teleportation and dense coding are in some sense inverses of each other. 

5.4 Realizing Unitary Transformations as Quantum Circuits 

This section shows how arbitrary unitary transformations can be implemented from a set of 
primitive transformations. The primitive set we consider includes the two-qubit C„ ot gate, in 
addition to three kinds of single-qubit gates. Using just these four types of operations, any arbitrary 
«-qubit unitary transformation can be implemented. Section 5.4.1 shows that general single-qubit 
transformations can be decomposed into products of the three kinds of primitive single-qubit 
operators. Sections 5.4.2 and 5.4.3 show how to construct multiple-qubit controlled versions 
of single-qubit transformations. Section 5.4.4 uses these transformations to construct arbitrary 
unitary transformations. 

This chapter merely shows that all quantum transformations can be implemented in terms of 
simple gates; we are not yet concerned with the efficiency of such implementations. Most quantum 
transformations do not have an efficient implementation in terms of simple gates. Much of the 
rest of the book will be devoted to understanding which quantum transformations have efficient 
implementations and how these can be used to solve computational problems. 

5.4.1 Decomposition of Single-Qubit Transformations 

This section shows that all single-qubit transformations can be written as a combination of three 
types of transformations, phase shifts K(S), rotations and phase rotations T(a). 
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A phase shift by 8 
A rotation by /3 

A phase rotation by a. 

K(Si +<$ 2 ) = K(Si)K(S 2 ), 

R(Pi+Pi) = R(Pi)R(P 2 ), 

and 



T (»i + a 2 ) — T («i )T(a 2 ), 

and that the operator K commutes with K , T, and R. 

Rather than write K(8 ), we frequently just write the scalar factor e' s . Even though, as a 
transformation on a single-qubit system, K (8) performs a global phase change, and thus is 
equivalent to the identity on the single-qubit system, we include it here because we will use it 
later as part of multiple-qubit conditional transformations in which this factor becomes a relative 
phase shift that is physically relevant. The transformation R(a) and T(a) are rotations by 2 a 
about the y- and 4 -axis of the Bloch sphere respectively. 

This paragraph shows that any single-qubit unitary transformation Q can be decomposed into 
a sequence of transformations of the form Q = K(8)T (a)R(P)T(y). Since the K(8) is a global 
phase shift with no physical effect, the space of all single-qubit transformations has only three 
real dimensions. Given the transformation 


Q — ( M °° " 01 

V KlO Mil 

it follows immediately from the unitarity condition QQ 1 = I that |«ool 2 + |«oi I 2 = 1, u oomTo + 
U 01 W 1 = 0, and \u\\| 2 + |«io| 2 = 1. A short calculation gives |«ool = Wn\ and | z/qiI = l«iol- So 
the magnitudes of the coefficients can be written as the sine and cosine of some angle /l; we 
can write Q as 

e' e °° cos(j 6 ) e l6>01 sin(/3) \ 

—e 1 ® 10 sin(yS) e' 6n cos (/l) ) 



Furthermore, the phases are not independent: uwuoo + mumoi = 0 implies that 0io — @00 = — 

6q\ . Since 


K(8)T(a)R(P)T (y) = 


e i(8+a+y) cos p £ i(8+a-y) s j n p \ 

_ £ i(S-a+y) S j n p e i(S-a-y) CQS p J ’ 


we can find 8 , a, y for a given Q by solving the equations 
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8 + a + y = 000 , 

S + a — y — 0oi, 

S — a + y — 010- 

Using 0ii = 0io — 0oo + 0oi) it is easy to see that this solution also satisfies S — a — y = On- 

5.4.2 Singly Controlled Single-Qubit Transformations 

Let Q — K(S)T(a)R{^)T(y ) be an arbitrary single-qubit unitary transformation. The controlled 
gate f\ Q can be implemented by first constructing /\ K(S) and implementing /\ Q' for Q' = 
T(a)R(P)T(y). Then f\Q = (f\ K(S))(/\ Q'). We now show how to implement these two 
transformations in terms of basic gates. 

The conditional phase shift can be implemented by primitive single-qubit operations: 

A™ = |0> (0|<8>/ + |1)<1|® £■(<$) 

= |0) (0| <8> / + e i5 | 1)(1| <g> / 


= (K(8/2)T(-8/2))®I. 
Graphically, the implementation looks like 



It may appear surprising that the conditional phase shift K (8) can be realized by a circuit acting 
on the first qubit only, with no transformations acting directly on the second qubit. The reason 
that transformations on the first qubit suffice is that a phase shift affects the whole quantum state, 
not just a single qubit. In particular, \x) <g> a\y) = a\x) ® |y). 

Implementing /\ Q' is slightly more involved. For Q' — T(a)R(l J >)T(y). define the following 
transformations: 


Go = T{a)R{fi/ 2), 


Gi = R(-P/2)T . 


Qi 



The claim is that /\ Q' can be defined as 

AG' = (/® Qo)C not (I® Qi)C not (I® Qi) 
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or graphically 



It is easy to see that this circuit performs the following transformation: 

|0> ® \x) |0> <g> QoQi Qi\x). 

| 1 >®| x ) -> \\)®Q 0 XQ x XQ 2 \x). 

Using R(/3)R(—P) — I and T (u)T (y) = T(a + y), the property QoQi Qi = I follows imme¬ 
diately from the definition of the Q To show that QqXQ\XQ 2 = Q', use XR(f J >)X — R(—f J >) 
and XT(a)X — T(— a). Then 

QvXQ x XQ 2 = T(a)R{p/2)(XR(-p/2)X)(XT(-^^)X)T(^^) 

= Q’. 

In this way, we can realize a version of an arbitrary single-qubit transformation controlled by 
a single qubit. 

5.4.3 Multiply Controlled Single-Qubit Transformations 

The graphical notation of sections 5.2.4 and 5.4.2 for controlled operations generalizes to more 
than one control bits. Let /\ k Q be the (k + l)-qubit transformation that applies Q to qubit 0 
when qubits 1 through k are all 1. For example, the controlled-controlled- not gate or Toffoli gate 
/\t X, which negates the last bit of three if and only if the first two are both 1, has the following 
graphical representation. 

-e- 

-e- 

-X- 

The subscript 2 in the notation /\ 2 X indicates that there are two control bits. We write the C not 
gate as both /\ X and /\ x X. 

The construction of 5.4.2 can be iterated to obtain arbitrary single-qubit transformations con¬ 
trolled by k qubits. To implement /\ 2 Q, a three-qubit gate that applies Q controlled by two 
qubits, start by replacing each of Qq, Q \, and Q 2 in the previous construction with a single-qubit 
controlled version. 
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This circuit can be expanded, as in the previous section, into single-qubit and controlled-NOT 
gates, for a total of twenty five single-qubit gates and 12 controlled-NOT gates. Repeating this 
process leads to circuits for controlled versions of single-qubit transformations with k control 
bits, /\ k Q, with 5^ single-qubit transformations and — 1) controlled-NOT gates. As section 
6.4.2 shows, significantly more efficient implementations of /\ k Q are known. 

All of the controlled gates seen so far are executed when the control bits are 1. To implement 
a singly controlled gate that is executed when the control bit is 0, the control bit can be negated, 
as in 



For any length k bit-string .v, temporarily negating the appropriate control qubits in this way, 
enables the realization of a controlled gate that applies Q to qubit 0 exactly when the other k 
qubits are in the pattern s. More precisely, let |,v) be the fc-qubit standard basis vector labeled with 
bit-string s. This construction implements the {k + 1 )-qubit controlled gate that applies the single¬ 
qubit transformation Q to qubit 0 when qubits 1 though k are in the basis state .v) and does nothing 
to qubit 0 when qubits 1 though k are in a different basis state. Such constructions can be further 
generalized to (k + 1)-qubit controlled gates that apply the single-qubit transformation Q to qubit 
/ when the other qubits are in a specific basis state and do nothing when they are in a different basis 
state. In other words, this transformation applies Q to the two-dimensional subspace spanned by 
the two basis vectors \xk ... xt ... xq) and |Xk ... ... xo), where x,- = x, ® 1, that differ only in 

bit/, and it leaves the orthogonal subspace invariant. 

Section 5.4.4 uses such gates to exhibit an explicit implementation of an arbitrary unitary 
transformation. The construction of section 5.4.4 uses two different transformations related to a 
pair consisting of a /r-bit bit-string ,v and a single-qubit transformation Q: the first applies Q to 
the / th qubit with the standard ordering of the basis {|0>, 11)} when the other k qubits are in state 
| s), and the second applies Q to the /th qubit with the basis in the other order. In other words, this 
second transformation applies X QX to qubit / when the other qubits are in state ,v). We use the 
notation /\' x Q, or 

i 

\Q * 

X 
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where x is a (k + 1 )-bit bit-string such that ,%k . .. Xi+iXj-i ... xq — Sk- i ... so, to represent both of 
these transformations depending on the value of x ,•. When x,- is 0, the single-qubit transformation 
Q is applied. When x, is 1, the transformation X QX is applied. When i is specified, the notation 
x means that the ith bit of a bit-string x has been flipped: x = x©2‘. For any single-qubit 
transformation Q, the transformation f\'~ Q — f\' x Q, where Q — XQX. Geometrically /\' x Q 
is a rotation in the two-dimensional complex subspace spanned by standard basis vectors |x) 
and \x). 


Example 5.4.1 On a two-qubit system \b\bo), f \° 0 X is the standard C no t, with b\ being the 
control bit and bo being the target. The notation /\,, X also represents the C, Iot transformation 
because X is invariant under reversing the order of the basis for qubit bo: X — XXX. The notation 
/\ 00 X is a controlled-NOT transformation except that now X is performed only when h\ has value 
0. The notation /\qj X describes the standard C no t but with bo as the control bit and b\ as the 
target. 


This section showed how to implement multiply controlled single-qubit gates using a number 
of basic gates that is exponential in the number of qubits. Section 6.4.2 shows how to implement 
efficiently any multiply controlled single-qubit operation. That construction uses linearly many 
basic gates and a single additional qubit. 


5.4.4 General Unitary Transformations 

This section presents a systematic way to implement an arbitrary unitary transformation on the 
2"-dimensional vector space associated with the state space of an n -qubit system. The intuitive 
idea behind the construction is that any unitary transformation is simply a rotation of the 2"- 
dimensional complex vector space underlying the n -qubit quantum state space, and that any 
rotation can be obtained by a sequence of rotations in two-dimensional subspaces. 

Let N = 2". This section writes all matrices in the standard basis, but with a nonstandard 
ordering {|xo}, • • •, |xjv-i}} such that successive basis elements differ by only one bit. Such a 
sequence of binary numbers is called a Gray code. Any Gray code will do. For 0 < / < N — 2, 
let jt be the bit on which |x, } and |x, + i) differ, and B, be the shared pattern of all the other bits in 
|x,-> and |x i+ i >. The next few paragraphs show how to realize an arbitrary unitary operator U as a 
sequence of multiply controlled single-qubit operators /\ J X \ Q that perform a series of rotations, 
each in a two-dimensional subspace spanned by successive basis elements. 

Consider transformations U m of the form 


U n) 


/(«o 0 \ 

0 v N _ m ) ’ 


where I (m) is the m x m identity matrix and Vn-,h is an (TV — m) x (N — m)-unitary matrix with 
0 < m < N — 2. We wish to show that given any (N x A/>matrix U m -\, 0 < m < A — 2, of this 
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form there exist operators C m , the product of multiply controlled single-qubit operators, and a 
U m , now with a larger identity component 7 (m) , such that U m -\ — C m U m . Then, taking Vn = U, 
the unitary operator U can be written as 

U — Uq = C\ ■■ ■ Cn-iUn-2- 

The transformation Un-i has the form 


Un-2 


l(N-2) 0 X 

0 V 2 )' 


which is simply the operation /\ J Vo where x = x^-i and, using the Gray code condition, j — 
jff-2 is the bit in which the last two basis vectors \xn- 2 ) and |xw_i) differ. So once we show how 
to implement the C m using multiply controlled single-qubit operators, we will have succeeded in 
showing that any unitary operator can be expressed in terms of such operators, and thus can be 
implemented using only C not , K (75), R(fi), and T (a). 

The basis vector \x m ) is the first basis vector on which U m -\ acts nontrivially. Write 


11 Irn ) — G m _] |x w ) — Cl m \x m ) • • • T Q[\] \X/y) . 


We may assume that a N is real, since we can multiply C/ m _i by a global phase. If we can find a 
unitary transformation W m , composed only of multiply controlled single-qubit transformations, 
that takes \v m ) to \x m ) and does not affect any of the firstm elements of the basis, W m U m _\ would 
have the desired form, so we would take U m = W m U m -\ and C m = W~ l . To define W m , begin 
by rewriting the coefficients of the last two components of | v m ): 

\v m ) = a m \x m )-\ -hCA,_! cosC^.^e 1 ^- 1 !.^.!) +c N _i sin^.OIxA,}, 


where 

fljv-i = Wn-\W ^ n ~ 1 , 
Cjv -1 = \] |fljv-i + [fljv 
COS(0]V-l) = Wn-\ I/Gv-i. 
sin(0jv-i) = \ a N 1/ cn-i- 
Then 


Jn -i Jn -i 

A r @ n - i) A 

*N -1 X N -1 

takes \ v m ) to 

a m\Xm) + - - \- , 
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where a' N _ l = Cjy-i, since /\ J X N N ~\ K{— <Pn-\) cancels the factor, and /\{ N N ~\ R(9n- i) 

rotates so that all of the amplitude that was in |xjv> is now in |xjv-i}- None of the other basis 
vectors are affected because the controlled part of the operators ensure that only basis vectors 
with bits in pattern Z?ai_i are affected. To obtain the rest of W m , we iterate this procedure over all 
pairs of coordinates {«jv- 2 < through {a m , a' m+l ] to obtain the operator 

jm jm 7JV—1 JN-1 

W m = /\R(6 m )/\K (-&,)■•• /\ R(9 n . i ) /\ K(-4> n _ 0 , 

x m x m X N —1 X N—1 

which takes to a' m \x m ), where 



COS(0/) = \di\/Ci, 

sin(0, ) = W i+ \ l/Q- 

The coefficient a' m — 1, since the image of | v m ) must be a unit vector, and the final /\^“ K(—(p m ) 
ensures that it is a positive real. 

While this procedure provides an implementation for any unitary operator U in terms of simple 
transformations, the number of gates needed is exponential in the number of qubits. For this 
reason, it has limited practical value in that more efficient implementations are needed for realistic 
computations. Most unitary operators do not have efficient realizations in terms of simple gates; 
the art of quantum algorithm design is in finding useful unitary operators that have efficient 
implementations. 

5.5 A Universally Approximating Set of Gates 

Section 5.4 showed that all unitary transformations can be realized as a sequence of single-qubit 
transformations and controlled-NOT gates. From a practical point of view, we would prefer to 
deal with a finite set of gates. It is easy to show that for any finite set of gates there are unitary 
transformations that cannot be realized as a combination of these gates, but there are finite sets of 
gates that can approximate any unitary transformation to arbitrary accuracy. Furthermore, for any 
desired level of accuracy 2~ d , this approximation can be done efficiently; there is a polynomial 
p(d) such that any single-qubit unitary transformation can be approximated to within 2~ d by a 
sequence of no more than p(d) gates from the finite set. We will not prove this efficiency result, 
known as the Solovay-Kitaev theorem, but we will exhibit a finite set of gates that can be used to 
approximate all unitary transformations. 
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Since any unitary transformation can be realized using single-qubit and C no , gates, it suffices 
to find a finite set of gates that can approximate all single-qubit transformations. Consider the set 
consisting of the Hadamard gate H, the phase gate P*_ , the jr/8-gate Pn , and the C nor gate where 

= e °,2 ) = lo><oi + i|i><i| 

and 

( 1 0 \ to 

p$ = ( o e f ) = |0>(0| + e 4|l)(l|. 

Recall from section 5.4.1 the single-qubit operator T(9) — e ,e |0}(0| +e _ie |l}(l|. The 7r/8-gate 
Rj got its name because, up to a global phase, it acts in the same way as the gate 7’ (— |), 

and unfortunately the name stuck in spite of the confusion it causes. (When used on their own, it 
does not matter whether Pe or T(— ?■) is used, since they differ only in a global phase, but when 
used as part of a controlled gate construction, this phase becomes a physically relevant relative 
phase.) 

A rotation R is a rational rotation if, for some integer m, R m — I. If no such m exists, then 
R is an irrational rotation. It may seem surprising that a set of gates consisting only of ratio¬ 
nal rotations on the Bloch sphere can approximate all single-qubit transformations. Don’t we 
need an irrational rotation? In fact, the proof proceeds by using these gates to construct an irra¬ 
tional rotation. Such a construction is possible because the group of rotations of a sphere differs 
from the group of rotations of a Euclidean plane. In the Euclidean plane, the product of two 
rational rotations is always rational, but the analogous statement is not true for rotations of the 
sphere. Exercise 5.21 guides the reader through proofs of the relevant properties of groups of 
rotations of the sphere and the Euclidean plane. 

Exercises 5.19-5.22 develop the steps in the following spherical geometry argument in more 
detail. The gate Pe_ is a rotation by 7 t/ 4 about the z-axis of the Bloch sphere. The transformation 
S — HPe_H is a rotation by 7r/4 about the v-axis. It is a good exercise in spherical geometry 
to show that V = P*S is an irrational rotation. Since V is irrational, any rotation W about the 
same axis can be approximated to within arbitrary precision 2~ d by some power of V. Recall 
from section 5.4.1 that any single-qubit transformation may be achieved (up to global phase) by 
combining rotations about the y- and .—axes: for every single-qubit operation W there exist angles 
a, p, y, and S such that 

W = K(8)T (or) R(P)T(y), 

where T (a) rotates by angle a about the z-axis and R (a) rotates by angle a about the y-axis. The 
set of rotations about any two distinct axes can achieve arbitrary single-qubit transformations. 
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Since // V // has a different axis from V , the two transformations H and V generate all single-qubit 
operators. Other universally approximating finite sets, with varying advantages and disadvantages, 
exit. 

5.6 The Standard Circuit Model 

A circuit model for quantum computation describes all computations in terms of a circuit composed 
of simple gates followed by a sequence of measurements. The simple gates are drawn either from 
a universal set of simple gates or a universally approximating set of quantum gates. The standard 
circuit model for quantum computation takes as its gate set the C not gate together with all single¬ 
qubit transformations, and it takes as its set of measurements single-qubit measurements in the 
standard basis. So all computations in the standard model consist of a sequence of single-qubit 
and C not gates followed by a sequence of single-qubit measurements in the standard basis. While 
a finite set of gates would be more realistic than the infinite set of all single-qubit transformations, 
the infinite set is easier to work with and, by the results of Solovay and Kitaev, the infinite set 
does not yield significantly greater computational power. For conceptual clarity, the n qubits of 
the computation are often organized into registers, subsets of the n qubits. 

Other models of quantum computation exist. Each model provides its own insights into the 
workings of quantum computation, and each has contributed to the growth of the field through 
new algorithms, new approaches to robust quantum computation, or new approaches to building 
quantum computers. The most significant of these models will be discussed in section 13.4. 

One of the strengths of the standard circuit model is that it makes finding quantum analogs of 
classical computation straightforward. That is the subject of the next chapter. Finding quantum 
analogs of reversible classical circuits is easy; all of the technical difficulties involve the entirely 
classical problem of converting an arbitrary classical circuit into a reversible classical circuit. 
The results of section 5.4 show that any quantum transformation can be realized in terms of the 
basic gates of the standard circuit model. But it says nothing about efficiency. Chapter 6 finds not 
only a quantum analog for any classical computation, but also a quantum analog with comparable 
efficiency. Part II explores the design of quantum algorithms, which involves finding quantum 
transformations that can be efficiently implemented in terms of the basic gates of the standard 
circuit model and figuring out how to use them to solve certain problems more efficiently than is 
possible classically. 

5.7 References 

The no-cloning theorem is due to Wootters and Zurek [286]. Both dense coding and quantum 
teleportation were discovered in the early 1990s, dense coding by Bennett and Wiesner [46] 
and quantum teleportation by Bennett et al. [44], Single-qubit teleportation has been realized in 
several experiments, see for example, [57], [221], and [56]. 



94 


5 Quantum State Transformations 


An outline for a proof of the Solovay-Kitaev theorem was given in [173]. Dawson and Nielsen 
provide a pedagogical review of this result in [95]. A related issue, namely how much precision 
is needed to carry out a quantum computation of k steps is answered by Bernstein and Vazirani 
[49]: a precision of 0(\ogk) bits suffices. (See box 6.1 for the (9(f) notation.) 

The implementation of complex unitary transformations from basic ones is described in a paper 
by Barenco et al. [31]. 

A proof that most quantum transformations cannot be implemented efficiently and exactly 
in terms of two-qubit gates can be found in Knill’s Approximation by Quantum Circuits [177]. 
Deutsch found a single three-qubit gate that by itself can produce arbitrarily good approximations 
to any unitary transformation [100]. Later, Deutsch, Barenco, and Ekert showed that almost any 
two-qubit gate could accomplish the same thing [101]. Others have found other small sets of 
generators. 

5.8 Exercises 

Exercise 5.1. Show that any linear transformation U that takes unit vectors to unit vectors 
preserves orthogonality: if subspaces .S'] and Si are orthogonal, then so are USi and USi . 

Exercise 5.2. For which sets of states is there a cloning operator? If the set has a cloning operator, 
give the operator. If not, explain your reasoning. 

a- {|0>,|1>}, 

b- {|+>,|->}, 

c. {|0>, |1), |+>, |—>}, 

d. (|0>|+), |0>| —), |1>|+), 11)1 —)}, 

e. [a|0) + b\ 1)}, where |a| 2 + \b\ 2 — 1. 

Exercise 5.3. Suppose Eve attacks the BB84 quantum key distribution of section 2.4 as follows. 
For each qubit she intercepts, she prepares a second qubit in state |0), applies a C not from the 
transmitted qubit to her prepared qubit, sends the first qubit on to Bob, and measures her qubit. 
How much information can she gain, on average, in this way? What is the probability that she is 
detected by Alice and Bob when they compare s bits? How do these quantities compare to those 
of the direct measure-and-transmit strategy discussed in section 2.4? 

Exercise 5.4. Prove that the following are decompositions for some of the standard gates. 

/ = (0)7X0)/? (0)7X0) 

X = -iT(jt/2)R(jr/2)T(0) 

H = —\T {jz/ 2)R(jt/A)T (0) 

Exercise 5.5. A vector |t/t) is stabilized by an operator U if U\ijt) — \i/r). Find the set of vectors 
stabilized by 
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a. the Pauli operator X, 

b. the Pauli operator Y, 

c. the Pauli operator Z, 

d. X®X, 

e. Z®X, 
f c 

■■ not • 

Exercise 5.6. 

a. Show that R(a) is a rotation of 2 a about the v-axis of the Bloch sphere. 

b. Show that T (fi) is a rotation of 2/1 about the z-axis of the Bloch sphere. 

c. Find a family of single-qubit transformations that correspond to rotations of 2 y about the .v axis. 

Exercise 5.7. Show that the Pauli operators form a basis for all linear operators on a two- 
dimensional space. 

Exercise 5.8. What measurement does the operator i Y describe? 

Exercise 5.9. How can the circuit of figure 5.2 be used to measure the qubits bo and b\ for 
equality without learning anything else about the state of bo and b\ ? (Hint: you are free to chose 
any initial state on the register consisting of qubits «o and (i \.) 

Exercise 5.10. An «-qubit cat state is the state -W(|00 ... 0) + 111... 1). Design a circuit that, 
upon input of |00 ... 0>, constructs a cat state. 

Exercise 5.11. Let 

I W„) = -5=(|0... 001) + |0... 010> + |0... 100) + • • • + |1... 000)). 

v« 

Design a circuit that, upon input of 100... 0), constructs | W„). 

Exercise 5.12. Design a circuit that constructs the Hardy state 



Figure 5.2 

Circuit for exercise 5.9. 
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Exercise 5.13. Show that the swap circuit of section 5.2.4 does indeed swap two single-qubit 
values in that it sends |t/r)|</> > to \<p)\ir) for all single-qubit states |i jr) and \<p). 

Exercise 5.14. Show how to implement the Toffoli gate f\~, X in terms of single-qubit and C no , 
gates. 

Exercise 5.15. Design a circuit that determines if two single qubits are in the same quantum 
state. The circuit may include an ancilla qubit to be measured. The measurement should give a 
positive answer if the two-qubit states are identical, a negative answer if the two-qubit states are 
orthogonal, and be more likely to give a positive answer the closer the states are to being identical. 

Exercise 5.1 6 . Design a circuit that permutes the values of three qubits in that it sends | \[r > \<p) | /;} 
to |</>)|??>|i/ r ) for all single-qubit states | \[r), \(p), and \rj). 

Exercise 5.17. Compare the effect of the following two circuits 


W) 

|o> 


0 ) 



Exercise 5.18. Show that for any finite set of gates there must exist unitary transformations that 
cannot be realized as a sequence of transformations chosen from this set. 

Exercise 5.19. Let R be an irrational rotation about some axis of a sphere. Show that for any 
other rotation R' about the same axis and for any desired level of approximation 2~ d there is some 
power of R that approximates R' to the desired level of accuracy. 

Exercise 5.20. Show that the set of rotations about any two distinct axes of the Bloch sphere 
generate all single-qubit transformations (up to global phase). 

Exercise 5.21. 

a. In the Euclidean plane, show that a rotation of angle 6 may be achieved by composing two 
reflections. 

b. Use part (a) to show that a clockwise rotation of angle 6 about a point P followed by a clockwise 
rotation of angle 4> about a point Q results in a clockwise rotation of angle 9 + <p around the point 
R , where R is the intersection point of the two rays, one through P at angle 6 /2 from the line 
between P and Q, and the other through point Q at an angle of (p /2 from the line between P and Q. 
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c. Show that the product of any two rational rotations of the Euclidean plane is also rational. 

d. On a sphere of radius 1, a triangle with angles 9, <p, and rj has area 9 + (p + rj (where 9, 0, and rj 
are in radians). Use this fact to describe the result of rotating clockwise by angle 9 around a point 
P followed by rotating clockwise by angle (p around a point Q in terms of the area of a triangle. 

e. Prove that on the sphere the product of two rational rotations may be an irrational rotation. 

Exercise 5.22. 

a. Show that the gates H, P* and P| are all (up to global phase) rational rotations of the Bloch 
sphere. Give the axis of rotation and the angle of rotation for each of these gates, and also the 
gate S = HP j H. 

b. Show that the transformation V = P* S is an irrational rotation of the Bloch sphere. 
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Quantum Versions of Classical Computations 


This chapter constructs, for any classical computation, a quantum circuit that can perform the 
same computation with comparable efficiency. This result proves that quantum computation is 
at least as powerful as classical computation. In addition, many quantum algorithms begin by 
using this construction to compute a classical function on a superposition of values prior to using 
nonclassical means for efficiently extracting information from this superposition. 

The construction of quantum analogs to all classical computations relies on a classical result 
that constructs a reversible analog to any classical computation. Section 6.1 describes relations 
between classical reversible computation and both general classical computation and quantum 
computation. Section 6.1.1 exhibits reversible versions of Boolean logic gates and quantum ana¬ 
logs of these reversible versions. Given a classical reversible circuit composed of reversible 
Boolean logic gates, simple substitution of the analogous quantum gates for the reversible gates 
gives the desired quantum circuit. The hard step in proving that every classical computation has a 
comparably efficient quantum analog is proving that every classical computation has a reversible 
version of comparable efficiency. Although this construction is purely classical, it is of such 
fundamental importance to quantum computation that we present it here. Section 6.2 provides 
this construction. Section 6.3 describes the language that section 6.4 uses to specify explicit 
quantum circuits for several classical functions such as arithmetic operations. 

6.1 From Reversible Classical Computations to Quantum Computations 

Any sequence of quantum transforms effects a unitary transformation U on the quantum system. As 
long as no measurements are made, the initial quantum state of the system prior to a computation 
can be recovered from the final quantum state | ip-) by running U~ l = t/ + on | ijr). Thus, any 
quantum computation is reversible prior to measurement in the sense that the input can always 
be computed from the output. 

In contrast, classical computations are not in general reversible: it is not usually possible to 
compute the input from the output. For example, while the classical not operation is reversible, 
the and, or, and nand are not. Every classical computation does, however, have a classical 
reversible analog that takes only slightly more computational resources. Section 6.1.1 shows how 
to make basic Boolean gates reversible. Section 6.2.2 shows how to make entire Boolean circuits 
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reversible in a resources efficient way, considering space, the number of bits required, and the 
number of primitive gates. This construction of efficient classical reversible versions of arbitrary 
Boolean circuits easily generalizes to a construction of quantum circuits that efficiently implement 
general classical circuits. 

Any classical reversible computation with n input and n output bits simply permutes the N — 2" 
bit strings. Thus, for any such classical reversible computation there is a permutation tc : Z # —»■ 
Z. v sending an input bit string to its output bit string. This permutation can be used to define a 
quantum transformation 

N -1 N -1 

Un ■ i-> X>br(*)>, 

x=0 x=0 


that behaves on the standard basis vectors, viewed as classical bit strings, exactly as n did. The 
transformation U n is unitary, since it simply reorders the standard basis elements. 

Any classical computation on n input and m output bits defines function 

f :Z N -> Z M 
x i-> f{x) 

mapping the N — 2” input bit strings to the M = 2'” output bit strings. Such a function can be 
extended in a canonical way to a reversible function tc / acting on n + m bits partitioned into two 
registers, the n- bit input register and the 771 -bit output register: 

: Zl -> Z L 

(x,y) (x,y®f(x)), 

where © denotes the bitwise exclusive-OR. The function nf acts on the L — 2 n+m bit strings, each 
made up of an 77 -bit bit string x and an 717 -bit bit string y. For y = 0, the function tc acts like /, 
except that the output appears in the output register and the input register retains the input. There 
are many other ways of making a classical computation reversible, and for a particular classical 
computation, there may be a reversible version that requires fewer bits, but this construction 
always works. 

Since itf is reversible, there is a corresponding unitary transformation Uf : \x, y) —► 
\x, y © f{x)). Graphically the transformation Uf is depicted as 





6.1 From Reversible Classical Computations to Quantum Computations 


101 


Section 5.4 showed how to implement any unitary operation in terms of simple gates. For most 
unitary transformations, that implementation is highly inefficient. While most unitary operators 
do not have an efficient implementation, Uf has an efficient implementation as long as there is a 
classical circuit that computes / efficiently. The method for constructing an efficient implementa¬ 
tion of Uf from an efficient classical circuit for f has two parts. The first part constructs an efficient 
reversible classical circuit that computes /. The second part substitutes quantum gates for each of 
the reversible gates that make up the reversible classical circuit. Section 6.1.1 defines reversible 
Boolean logic gates and covers the easy second part of the construction. Section 6.2 explains 
the involved construction of an efficient reversible classical circuit for any efficient classical 
circuit. 

6.1.1 Reversible and Quantum Versions of Simple Classical Gates 

This section describes reversible versions of the Boolean logic gates not, xor, and, and nand. 
Quantum versions of these gates act like the reversible gates on elements of the standard basis. 
Their action on other input states is prescribed by the linearity of quantum operations; the action 
of a gate on a superposition is the linear combination of the action of the gate on the standard 
basis elements making up the superposition. In this way, the behavior of a reversible gate fully 
defines the behavior of its quantum analog, and vice versa. The tight connection between the two 
allows us to use the same notation for both gates with the understanding that the quantum gates 
can be applied to arbitrary superpositions, whereas the classical reversible gates are applied to bit 
strings that correspond to the standard basis elements. 

Let b\ and bo be two binary variables, variables taking on only values 0 or 1. We define the 
following quantum gates: 

not The not gate is already reversible. We will use X to refer to both the classical reversible 
gate and the single-qubit operator X = 10) (11 + 11>(0| of section 5.2, which performs a classical 
not operation on classical bits encoded as the standard basis elements. 

xor The controlled negation performed by the C no , — /\ l X gate amounts to an xor operation 
on its input values. It retains the value of the first bit b\, and replaces the value of the bit bo with 
the xor of the two values. 


b\) -< 

5 — N 

bo) - 1 

' \b\ © bo) 


The quantum version behaves like the reversible version on the standard basis vectors, and its 
behavior on all other states can be deduced from the linearity of the operator. 
and It is impossible to perform a reversible and operation with only two bits. The three-bit 
controlled-controlled-NOT gate, or Toffoli gate, T = f\ 2 X can be used to perform a reversible 
and operation. 



102 


6 Quantum Versions of Classical Computations 


T\bu b 0 , 0} = \b\,bo, b\ A bo), 

where A is notation for the classical and of the two bit values. 

The Toffoli gate is defined for all input: when the value of the third bit is 1, 

T\b u b 0 , 1} = \bi, b 0 , 1 @b\ A bo), 

By varying the values of input bits, the Toffoli gate T can be used to construct a complete set of 
Boolean connectives, not just the classical and. Thus, any combinatorial circuit can be constructed 
from Toffoli gates alone. The Toffoli gate computes not, and, xor, and nand in the following 
way: 

T|l, 1, x) = 11, 1, i x) 

T\x, y, 0} = \x, y, x A y) 

T\l,x,y) = |1 ,x,x@y) 

T\x, y, 1} = \x, y, — , (x A y)), 

where —■ indicates the classical not acting on the bit value. 

An alternative to the Toffoli gate, the Fredkin gate F, acts as a controlled swap : 

F = Ai s > 

where S is the two-bit swap operation 
S : | xy) -> | yx). 

The Fredkin gate F, like the Toffoli gate T, can implement a complete set of classical Boolean 
operators: 

F\x, 0, 1) = \x,x,->x) 

F\x, y, 1} = \x, yvx, y v —>x) 

F\x, 0, y) — \x,y Ax, y A ~ i x), 

where V is notation for the classical or of the two bit values. 

Because a complete set of classical Boolean connectives can be implemented using just the 
Toffoli gate T, or the Fredkin gate F, these gates can be combined to realize arbitrary Boolean 
circuits. Section 6.2 describes explicit implementations of certain classical functions. As the 
equations for the Toffoli gate illustrate, the operations C not and X can be implemented by Toffoli 
gates with the addition of one or two bits permanently set to 1. For clarity, we use C not and X 
gates in our construction, but all constructions can be done using only Toffoli gates, since we 
can replace all uses of C not and X with Toffoli gates that have additional input bits with their 
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Figure 6.1 

One-bit full adder. 


I C ) 

\ X ) 

\y) 

k> 

\ c ') 


input values set appropriately. For example, the circuit shown in figure 6.1 implements a one-bit 
full adder using Toffoli and controlled-NOT gates, where x and y are the data bits, s is their sum 
(modulo 2), c is the incoming carry bit, and c' is the new carry bit. Several one-bit adders can be 
strung together to achieve full n-bit addition. 

6.2 Reversible Implementations of Classical Circuits 

This section develops systematic ways to turn arbitrary classical Boolean circuits into reversible 
classical circuits of comparable computational efficiency in terms of the number of bits and the 
number of gates. The resulting reversible circuits are composed entirely of Toffoli and negation 
gates. A quantum circuit with the same efficiency as the classical reversible circuit is obtained by 
the trivial substitution of quantum Toffoli and X gates for classical Toffoli and negation gates. 
Thus, as soon as we have an efficient version of a computation in terms of Toffoli gates, we 
immediately know how to obtain a quantum implementation of the same efficiency. 

6.2.1 A Naive Reversible Implementation 

Rather than start with arbitrary Boolean circuits, we consider a classical machine that consists of 
a register of bits and a processing unit. The processing unit performs simple Boolean operations 
or gates on one or two of the bits in the register at a time and stores the result in one of the 
register’s bits. We assume that, for a given size input, the sequence of operations and their order 
of execution are fixed and do not depend on the input data or on other external control. In analogy 
with quantum circuits, we draw bits of the register as horizontal lines. A simple program (for 
four-bit conjunction) for this kind of machine is depicted in figure 6.2. 

An arbitrary Boolean circuit can be transformed into a sequence of operations on a large enough 
register to hold input, output, and intermediate bits. The space complexity of a circuit is the size 
of the register. 

Computations performed by this machine are not reversible in general; by reusing bits in the 
register, the machine erases information that cannot be reconstructed later. A trivial, but highly 
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Figure 6.2 

Irreversible classical circuit for four-bit conjunction. 
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Figure 6.3 

Reversible classical circuit for four-bit conjunction. 


space inefficient, solution to this problem is not to reuse bits during the entire computation. Figure 
6.3 illustrates how the circuit can be made reversible by assigning the results of each operation 
to a new bit. The operation that reversibly computes the conjunction and leaves the result in a bit 
initially set to 0 is, of course, the Toffoli gate. Since the not gate is reversible, and not together 
with and form a complete set of Boolean operations, this construction can be generalized to 
turn any computation using Boolean logic operations into one using only reversible gates. This 
implementation, however, needs an additional bit for every and performed, so if the original 
computation takes t steps, then a reversible one constructed in this naive way requires up to t 
additional bits of space. 

Furthermore, this additional space is no longer in the 0 state and cannot be directly reused, for 
example, to compose two reversible circuits. Reusing temporary bits will be crucial to keeping 
the space requirements close to that of the original nonreversible classical computation. Resetting 
a bit to zero is not as trivial as it might seem. A transformation that resets a bit to 0, regardless of 
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whether it was 0 or 1 before, is not reversible (it loses information), so it cannot be used as part of 
a reversible computation. Reversible computations cannot reclaim space through a simple reset 
operation. They can, however, uncompute any bit set during the course of a reversible computation 
by reversing the part of the computation that computed the bit. 


Example 6.2.1 Consider the computation of figure 6.3. Bits t\ and to are temporarily used to 
obtain the output in bit mo. Figure 6.4 shows how to uncompute these bits, resetting them to their 
original 0 value by reversing all but the last step of the circuit in figure 6.3, so that they may be 
reused as part of a continuing computation. Here the temporary bits are reclaimed at the cost of 
roughly doubling the number of steps. 


We can reduce the number of qubits needed by uncomputing them and reusing them in the course 
of the algorithm. The method of uncomputing bits by performing all of the steps in reverse order, 
except those giving the output, works for any classical Boolean subcircuit. Consider a classical 
Boolean subcircuit of t gates operating on a 5-bit register. The naive construction requires up to 
t additional bits in the register. 


Example 6.2.2 Suppose we want to construct the conjunction of eight bits. Simply reversing 
the steps, generalizing the approach shown in figure 6.3, would require six additional temporary 
bits and one bit for the final output. We can save space by using the four-bit and circuit of 
figure 6.4 four times and then combining the results as shown in figure 6.5. This construction 
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Figure 6.4 

Reversible circuit that reclaims temporary bits. 
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Figure 6.5 

Combining reversible four-bit AND-circuits of figure 6.4 to construct an eight-way conjunction. 


uses two temporary bits in addition to the two temporary bits used in each of the four-bit ands. 
Since each of the four-bit ands uncomputes its temporary bits, these bits can be reused by the 
subsequent four-bit ands. This circuit uses only a total of four additional temporary bits, though 
it does require more gates. 


There is an art to deciding when to uncompute which bits to maintain efficiency and to retain 
subresults used subsequently in the computation. The key ideas of this section, adding bits to obtain 
reversibility and uncomputing their values so that they may be reused, are the main ingredients 
of the general construction described in section 6.2.2. By choosing carefully when and what to 
uncompute, it is possible to make a positive tradeoff, sacrificing some additional gates to obtain 
a much more efficient use of space. Examples, such as an explicit efficient implementation of an 
m-way and, are given in section 6.4. 

6.2.2 A General Construction 

This section shows how, by carefully choosing which bits to uncompute when, a reversible version 
of any classical computation can be achieved with only minor increases in the number of gates 
and bits. We show that any classical circuit using t gates and s bits, has a reversible counterpart 
using only 0(t 1+f ) gates and 0(s log t ) bits. (See box 6.1 for the 0{t) notation.) For t s, this 
construction uses significantly less space than the (s + 1) space of the naive approach described 
in section 6.2.1 at only a small increase in the number of gates. 
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Box 6.1 

Notation for Efficiency Bounds 

O (/(«)) is the set of functions bounded by /. Formally, 

g e O (/(«)) if and only if there exist constants k and iiq such that | g(n) \ <k\f{n)\ 
for all n > iiq. 

Similarly, £2 (/(«)) is the set of functions such that 

g e £2 (/(«)) if and only if there exist constants k and iiq such that | g(n) \ > k \f(n) \ 
for all n > /?o. 

Finally, the class of functions bounded by / from above and below is 

©(/(«)) = 0(/(#O)n £2 (/(«)). 




0 
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Figure 6.6 

Converting circuits C; into reversible ones Ik. 


In order to understand how to obtain these bounds, we must consider carefully how many bits 
are being used and in what way. Let C be a classical circuit, composed of and and not gates, 
that uses no more than t gates and s bits. The circuit C can be partitioned in time into r = [f/s] 
subcircuits each containing .v or fewer consecutive gates C = C 1 C 2 ... C ,. Each subcircuit C, 
has s input and ,v output bits, some of which may be unchanged. 

Using techniques from section 6.2.1, each circuit C, can be replaced by a reversible circuit 
R, that uses at most ,v additional bits as shown in figure 6.6. The circuit R; returns its input as 
well as the s output values used in the subsequent computation. The input values will be used to 
uncompute and recompute Rj in order to save space. 

More than s gates may be required to construct R,. In general, Rj can be constructed using at 
most 3s gates. While other more efficient constructions are possible, the following three steps 
always work. 

• Step 1 Compute all of the output values in a reversible way. For every and or not gate in the 
original circuit C,, the circuit Rj has a Toffoli or not gate. This step uses the same number of 
gates, s, as C,, and uses no more than s additional bits. 
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• Step 2 Copy all of the output values, the values used in subsequent parts of the computation, 
to the output register, a set of no more than s additional bits. 

• Step 3 Perform the sequence of gates used to carry out step 1, but this time in reverse order. In 
this way all bits, except those in the output register, are reset to their original values. Specifically, 
all temporary bits are returned to 0, and we have recovered all of the input values. 

The circuits R\ ... R r , when combined as in figure 6.7, perform the computation C in a 
reversible but space-inefficient way. The subcircuits Rj can be combined in a special way that 
uses space more efficiently by uncomputing and reusing some of the bits. Uncomputing requires 
additional gates, so we must choose carefully when to uncompute in order to reduce the usage of 
space without needing too many more gates. First, we show how to obtain a reversible version 
using 0(t Xog 2 3 ) gates and 0(s log t) bits, and then we improve on this method to obtain 0(t 1+e ) 
gates and 0(s log t) bit bounds. 

The basic principle for combining the r — [7/.v] circuits Rj is indicated in figure 6.8. The idea 
is to uncompute and recompute parts of the state selectively to reuse the space. We systematically 
modify the computation R\Ri... R r to reduce both the total amount of space used and to reset 
all the temporary bits to zero by the end of the computation. 

To simplify the analysis, we take r to be a power of two, r — 2 k . For 1 < i < k, let r, = 2'. 
We perform the following recursive transformation B that breaks a sequence into two equal-sized 
parts, recursively transforms the parts, and then composes them in the way shown: 

B(Ru • ■ •, Rr i+l ) = B(Ru , R n )B(R l+ri ,..., R n+l ) (B(R U .... R n ))~ 1 

B(R) = R, 

where (B(R\, ..., R , t )T 1 acts on exactly the same bits as B(R\, ... , R tj ) and so requires no 
additional space. 



Figure 6.7 

Composing the circuits to obtain a reversible, but inefficient, version of the circuit C. 
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Figure 6.8 

Composition of reversible computation circuits Rj in a way that reclaims storage. 


The transformed computation uncomputes all space except the output of the last step, so the 
additional space usage is bounded by s. Thus, B(R \, R r .) requires at most .v more space 
than B(R\, ..., R n l ). We can write the space S(i) required for each of the k — log 2 r steps i in 
the recursion in terms of the space requirements of the previous step: S(i) < s + .S' (7 — 1) with 
>5(1) < 2.v. The recursion ends after k — log 2 r steps, so the final computation B(R\,..., R r ) 
requires at most S(r ) < (k + I ).v = .v(log 2 r + 1) space. From the definition of B, it follows 
immediately that T(i), the number of circuits Rj executed by the computation B(R\ ,.... R n ), 
is T(i) = 3T(i — 1) with 7’(1) = 1. By assumption r = 2 k , so the reversible version of C we 
constructed uses 

T(2 k ) — 3 T(2 k ~ 1 ) — 3 k — 3 lo ®2 r _ r ^°S2^ 

reversible circuit Rj, each of which requires fewer than 3s gates. Thus, any classical computation 
of t steps and s bits can be done reversibly in 0(t log2i ) steps and 0(s log 2 t) bits. 

To obtain the 0(t l+e ) bound, instead of using a binary decomposition, consider the following 
m- ary decomposition. To simplify the analysis, suppose that r is a power of m, r = m k . For 
1 < i < k, let r, = m'. Abbreviating 7? 1+ ( x _i) r .. R xr . as R x j, then 

B{R u+l ) = B{R u ,R 2 , i ,..-R m , i ) 

= B(R u ),B(R 2 ,i),...B(R m - U ), 

B(R m j), 

B(R m . Ui r\...B(R 2J )-\B(R u r 1 


B(R) = R. 

In each step of the recursion, each block is split into m pieces and replaced with 2m — 1 blocks. We 
may assume without loss of generality that r = m k for some k, in which case we stop recursing 
after k steps. At this point r — m k subcircuits C\ have been replaced by (2m — I ) k reversible 
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circuits R , , so the total number of circuits R, for the final computation is (2m — 1)*, which we 
rewrite in terms of r: 

(2m — 1) Io Sm '■ = r log m (2m-l) ^ r log,„2m _ 


The number of primitive gates in R : is bounded by 3 s and r — \t/s~\. The total number of gates 
for a reversible circuit of t gates is 


T(t) 


3s I - 
, s 


i+ 


log 2 « 


1_|_i_ 

< 3 1 log 2 m . 


Thus, for any e > 0, it is possible to choose m sufficiently large that the number of gates 
required for the reversible computation is 0(t l+( ). The space bound remains the same as before, 
0(s log 2 t). 

Reversible versions of classical Boolean circuits constructed in this manner can be turned 
directly into quantum circuits consisting entirely of Toffoli and X gates. While our argument was 
given in terms of Boolean circuits, Bennett used the same argument to show that any classical 
Turing machine can be turned into a reversible one. Based on these arguments, from any classical 
circuit for /, an implementation of U/ can be constructed of comparable number of gates and 
bits. 

The care needed in uncomputing and reusing bits generalizes to qubits where the need for 
uncomputing values is even greater: uncomputing ensures that temporary qubits are no longer 
entangled with output qubits. This need to unentangle temporary values at the end of a computation 
is one of the differences between classical and quantum implementations. Quantum transforma¬ 
tions, being reversible, cannot simply reset qubits. Naively, one might think that temporary qubits 
could be reset by measuring the qubit and then, depending on the measurement outcome, per¬ 
forming a transformation to set them to 10). But if the temporary qubits were entangled with qubits 
containing the desired result, or results used later in the computation, measuring the temporary 
qubits may alter those results. Uncomputing temporary qubits disentangles them from the rest of 
the system without affecting the state of the rest of the system. The circuits of section 6.4 contain 
a number of examples of uncomputing temporary qubits. 

The next section sets up the language for quantum implementations used in section 6.4 to 
describe explicit implementations of certain arithmetic functions. These implementations are 
often more efficient than the general construction just given, but they all have analogous clas¬ 
sical implementations of comparable efficiency. Part II is devoted to truly quantum algorithms, 
algorithms with no classical analog. 


6.3 A Language for Quantum Implementations 


The quantum circuits we have discussed provide one way of describing a sequence of quantum 
gates acting on registers of qubits. We now give an alternate way of describing quantum circuits 
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that is more compact and easier to reason about. We use this notation to describe quantum 
implementations for some specific arithmetic functions. When we talk about the efficiency of 
these implementations, we simply count the number of simple gates in the quantum circuits 
they describe. This quantity is called the circuit complexity. We explain the relation of circuit 
complexity with other notions of complexity in section 7.2. 

The notation up to this point is standard and frequently used in the literature. Here we describe 
a language that we developed for describing quantum circuits or sequences of simple quantum 
gates. We use a program-like notation to give concise descriptions of quantum circuits that are 
cumbersome in graphical notation. Moreover, a single program in this notation can describe 
precisely a whole class of circuits acting on variable numbers of qubits as input (and classes 
that depend on other varying classical parameters). For example, just as in the classical case, a 
quantum circuit for adding 24-bit numbers differs from a circuit for adding 12-bit numbers, though 
they may be related. This programlike notation enables us to describe the relation precisely, while 
the graphical notation, though it may be suggestive, remains imprecise. 

The notation uses both classical and quantum variables. Classical control structures such as 
iteration, recursion, and conditionals are used to define the order in which quantum state trans¬ 
formations are to be applied. Classical information can be used in the construction of a quantum 
state or as parameters of quantum state transformations, but quantum information cannot be 
used in classical control structures. The programs we write are simply classical prescriptions for 
sequences of quantum gates that operate on a single global quantum register. 

6.3.1 The Basics 

Quantum variables are names for registers, subsets of qubits of a single global quantum register. 
If x is the variable name for an n -qubit register, we may write x[n] if we wish to make the number 
of qubits in x explicit. We use x, to refer to the / th qubit of x, and x, ■ ■ ■ X( : for qubits i through k 
of the register denoted by x. We will generally order the qubits of a register from highest index to 
lowest index so that if register x contains a standard basis vector | b), then b = x{2 1 . If U is a 

unitary transformation on n qubits, and x, y, and z are names for registers with a combined total 
of n qubits, then the program step U\x, y, z) — f/|jc}|y)|z) means “apply U to the qubits denoted 
by the register names in the order given.” It is illegal to use any qubit twice in this notation, so 
the registers x, y, and z must be disjoint; this restriction is necessary because “wiring” different 
input values to the same quantum bit is not possible or even meaningful. We are abusing the ket 
notation slightly here in that it is sometimes being used to stand for a placeholder, a qubit, that 
can contain a qubit value, a quantum state, and sometimes for the qubit value itself, but context 
should keep the two uses clear. 


Example 6.3.1 The Toffoli gate with control bits bs and bi and target bit has the following 
graphical representation 
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b 5 -e- 

b 4 - 

Z>3 -e- 

b 2 -*- 

h - 

bo - 

which is awkward to represent in the standard tensor product notation because the qubits it 
acts on are not adjacent. In our notation, this transformation can be written as T\bs, b 4 , b 2 )- 
The notation T\b 2 , b 2 , b 2 ) is not allowed, since it repeats qubits. The notation (T ® C not ® 
H)\x 5 ■ ■ • X 3 ) \xi , xo)\x 2 ), for a transformation acting on six qubits of a ten-qubit register x — 
xqx% ■ ■ ■ xo, is just another way of representing the transformation I ®I® H ®I®T ® I ® C, wt , 
where the separate kets indicate which qubits the transformation making up the tensor product is 
acting upon; the Toffoli gate T acts on qubits X 5 , x 4 , and X 3 , the Cnot on qubits X\ and xo, and 
the Hadamard gate 77 on qubit X 7 . The notation (T ® C not ® H)\xs ■ ■ ■ X 3 }|x 4 , xo>|x 7 ) is illegal 
because the first and second registers are not disjoint: they share qubit \x 4 ). 


Controlled operations are so frequently used that we give them their own notation; the notation 
|7>) control t7|x), where b and x are disjoint registers, means that on any standard basis vector 
the operator U is applied to the contents of register x only if all of the bits in b are 1. Writing 
->|7?) control U\x) is a convenient shorthand for the sequence 

X<8 >---<g>X|Z?) 

| b) control U\x, y) 

X®---®X\b). 

If we write a sequence of state transformations, they are intended to be applied in order. 

We allow programs to declare local temporary registers using qubit t[n ], provided that the 
program restores the qubits in these registers to their initial |0> state. This condition ensures 
that temporary qubits can be reused for different executions of the program and that the overall 
storage requirement is bounded. Furthermore, it ensures that the temporary qubits do not remain 
entangled with the other registers. 

6.3.2 Functions 

We allow the introduction of new names for sequences of program steps. Unlike commands 
such as control, the command define does not do anything to the qubits; it simply defines 
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Box 6.2 

Language Summary 


Terms 


U 

Name for a unitary transform 

u- 1 

Name for the inverse of U 

X 

Name for a register of qubits 

x\k\ 

Indicates number of qubits in register x 

qubit x\k\ 

Indicates x is a name for a register of temporary qubits initially set to [0) 

qubit t 

Indicates t is a name for a temporary qubit initially set to |0) 

Xi 

Name for the zth qubit of register x 

Xi ...Xj 

A sequence of qubits of register x 

\r) 

Indicates use of qubits named r 

Statements 

(T stands for an abstract statement) 

U\r) 

Apply U to qubits named r 

\b) control T 

Controlled form of statement T with control qubits b 

-•\b) control T 

Statement T controlled by negation of qubits b 

| £> l ) | £>o) control T 

Statement T controlled by two qubits named b\ and bo 

for i e [a..b\ 

Perform the sequence of statements r(a), T(a + 1), ..., r(fc), which 

r (/) 

depend on the classical parameter i 

define Name\x[k]) = 

Introduce Name as a name for a statement that performs statements Tq 

Tq, Tj, ... r„ 

through r„ to a /r-qubit register x 

Name\r) 

Applies the steps described in the definition of Name to register r. 

Name~ l \r) 

Applies the inverse of all the steps described in the definition of Name 
in reverse order to register r. Since all quantum transformations are 
reversible, this transformation is always well defined. 


a new function by telling the machine what sequence of commands a new function vari¬ 
able name represents. For example, addition modulo 2 with an incoming carry bit can be de¬ 
fined as 

Sum : |c, a , b) -> |c, a , (a + b + c) mod 2} 

define Sum \c)\a)\b) = 

| a) control X\b) 

|c) control X\ b). 

It operates on three single qubits by adding the value of a and the value of the carry c to the value 
of b. The program would be drawn as the circuit 
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-€ 

3 - 

-> 

(-> 

(- 


A corresponding carry operator is of the form 
Carry : |c, a, b, c') —*■ |c, a, b , d © C(a, b , c)>, 

where the carry C(a, b , c) is 1 if two or more of the bits a , b , c, are 1, that is, C(a, b , c) = 


(a A Z?) © (c A (a © b)). A program for Carry might look like 
define Carry \c, a, b, c') — 

\a)\b) control X\c') Compute a A b in register d (1) 

\a) control X\b) Compute a © b in register b (2) 

|c) \b) control X\d) Toggle result c' if c and current value of b (3) 

\a) control X\b) Reset b to original value (4) 


In this program, the register b temporarily holds, starting in step (2), the xor of the original values 
of a and b. In step (3), this value of register b means that d is toggled if c and exactly one of the 
original values of a and b is 1. Register b is reset to its original value in step (4). 

Repetition and conditional execution of sequences of quantum state transformations can be 
controlled using classical programming constructs. Only classical, not quantum, information can 
be used in the control structure. However, in quantum algorithms there is a choice as to which 
classical input values are placed in quantum registers and which are used simply as part of the 
classical control structure. For instance, one program to add x to itself n times might take classical 
input n and use it only as part of the classical control, while another might place n in an additional 
quantum register. The two programs would be of the form 

A n : |x, 0) i-> |x, nx) 

and 

A : |x, n, 0} i-a |x, 7j, nx) 

respectively. This distinction will be more important when we consider quantum algorithms that 
act on superpositions of input values; only input values placed in quantum registers, not input 
values that are part of the classical control structure, can be in superposition. 

The definition of a new program may use the same program recursively provided that the 
recursion can be unwound classically: recursive application of functions is allowed only as 
a shorthand for a classically specified sequence of quantum transformations. We can use the 
qubit t[n] construction recursively as long as the recursion depth is bounded by a static classical 
constant. 
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6.4 Some Example Programs for Arithmetic Operations 

The programs in this section implement quantum circuits for modular arithmetic and supporting 
operations. The operations shown are more general (though less efficient) than the modular 
arithmetic implementations used as part of Shor’s algorithm; here the modulus, M, is placed in a 
quantum register so these algorithms can act on superpositions of different moduli as well as on 
superpositions of other input. 

6.4.1 Efficient Implementation of and 

We give a linear implementation of an m -way and computed into an output qubit using just 
one additional temporary qubit. First we define a supporting transformation Flip that generalizes 
the Toffoli gate T. The transformation Flip acts on an m -qubit register a = a m _ | ... ag) and an 
(m — 1)-qubit register b — \b m _2 ... ho) and negates qubit h, exactly when the (i + 2)-conjunction 
Ajio a j i s true. We define Flip in terms of Toffoli gates T that perform bit flips on some of the 
qubits of register b depending on the contents of register a. 

define Flip |fl[2])|h[l]) = (base case m — 2) 

T\a\)\ao)\b) 

define Flip \a[m])\b[m — 1]) = (general case m > 3) 

T | \b m —3) \ b m —2) 

Flip \a m —2 ...a 0 ) \b m - 3 ... b 0 ) 

T | \b m —3) \b w —2) 

An inductive argument shows that Flip, when defined in this way, behaves as described. The 
transformation Flip, when applied to an m -qubit register a and an (m — l)-qubit register b, uses 
2 (m — 2) + 1 Toffoli gates T. 

Next we define an AndTemp operation on a (2m — 1)-qubit computational state that uses m — 2 
additional qubits to compute an and on m bits. We will shortly use AndTemp to construct an and 
operation that makes more efficient use of qubits. The operation AndTemp places the conjunction 
of the bits in register a in the single-qubit register b, making temporary use of the qubits in 
register c. 

define AndTemp |a[2])|h[l]) = (base case m — 2) 

T|fli)|a 0 >|h> 

define AndT emp \a[m])\b[Y\)\c[m — 2]} = (general case m > 3) 

Flip \a) (\b)\c)) Compute conjunction in b (1) 

Flip \ci m -2 ■ ■- a o)\c) Reset c (2) 

The parentheses in Flip |a) (|h)|c)) indicate that Flip is applied to the m-qubit register a and 
the m — 1 qubit register that is the concatenation of registers b and c. By the definition of Flip, 
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step (1) leaves the conjunction of the in b but changes the contents of c in the process. Step (2) 
undoes these changes to c. Since the first Flip uses 2(m — 2) + 1 Toffoli gates and the second 
Flip uses 2 (m — 3) + 1 Toffoli gates, AndTemp requires Am — 8 gates. An attractive feature of 
this construction of AndTemp is that the m — 2 additional qubits in register c can be in any state 
at the start of the computation, and they will be returned to their original states by the end of the 
computation so that we can use these to compute the m- way and if there are sufficiently many 
computational qubits already (n > 2m — 2). Clever use of this property of AndTemp will allow 
us to define an And on up to n qubits that uses only 1 additional temporary qubit. 

To construct the conjunction using less space, we recursively use AndTemp on one half of the 
qubits, using the other half temporarily and vice versa. Thus, a general And operator that requires 
a single temporary qubit can be defined as follows: Let k = \_m/2\ , and j — k — 2 for even m, 
j = k — 1 for odd m. The operator And has the effect of flipping b if and only if all bits of a are 1. 

define And |a[l])|fc[l]) = Trivial unary case, m — 1 

Cnot K> 1^} 

define And |a[2])|fc[l]) = Binary case, m = 2 

T\ai)\a 0 )\b) 


define And \a[m])\b) — General case, 3 < m 

qubit / [ 1] use a temporary qubit 

AndTemp |a,„_i ... a*} 1 1) \cij ... a q) (1) 

AndTemp {\t)\aj.. .ciq)) \b) \ak+j-i .. .ah) (2) 

AndT emp \a m -\ ... ak) \t) \ cij ... ao) (3) 

Step (1) computes the conjunction of the high-order bits using the low-order bits temporarily. In 
step (2) we compute the conjunction of the low-order bits using the high-order bits temporarily. 
Since AndTemp uses a linear number of gates, so does And. 


6.4.2 Efficient Implementation of Multiply Controlled Single-Qubit Transformations 

The linear implementation of And given in the last section enables a linear implementation of the 
multiply controlled single-qubit transformations /\' t Q of section 5.4.3. Given an m-bit bit string 
Z, let X (z> be the transformation 

x (:) = X®I 

which contains an X at any position where z has a 0 bit, and an I at any position where z has a 1 
bit. We implement the transformation Conditional(z, Q ), which acts on qubit b with single-qubit 
transformation Q if and only if the bits of register a match bit string z. 


define Conditional(z , Q) «[/«]}|/?[ 1 ]> = 
qubit t use a temporary qubit 


( 1 ) 
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if a and z match, a becomes all l’s 

(2) 

And \a)\t) 

and bits of a 

(3) 

r) control Q\b) 

if a matched z, apply Q to b 

(4) 

And \a) r> 

uncompute and 

(5) 


uncompute match 

(6) 


This construction uses 2 additional qubits and only 0(m) simple gates. When - is 11 ... 1 and 
Q — X, then Conditional{z, Q) is simply the And operator of the previous section. 

6.4.3 In-Place Addition 

We define an Add transformation that adds two n-bit binary numbers. The transformation 
Add : |c}|a}|&} -> |c)|a)|(a + b + c) mod2" +1 ), 

where a and c are n-qubit registers and b is an (n + l)-qubit register, adds two /r-hit numbers, 
placed in registers a and b , and puts the result in register b when register c and the highest order 
bit, b n , of register b are initially 0. 

The implementation of Add uses n recursion steps, where n is the number of bits in the numbers 
to be added. The i th step in the recursion adds the n — i highest bits, with the carry in the lowest of 
these n — i highest bits having first been computed. The construction uses Sum and Carry defined 
in section 6.3.2. We consider the two cases n — 1 and n > 1: 


define Add |c)|a)|fc[2]) = base case n — 1 

Carry \c)\a)\bo)\b{) carry in high bit of b (1) 

Sum \c)\a)\bo) sum in low bit of b (2) 

define Add |c[n])|a[n])|fr[n + 1]} = general case n > 1 

Carry |co)|ao)l^o)kt) compute the carry for low bits (3) 

Add \c„-\ ■ ■ ■ c\)\a n -i ■ ■ ■ a\)\b n ■ ■ ■ b\) add n — 1 highest bits (4) 

Carry~ x |co> |flo) l^o) ki> uncompute the carry (5) 

Sum |co} |ao) l^o} compute the low order bit (6) 


Step (5) is needed to ensure that the carry register is reset to its initial value. The Carry~ x operator 
is implemented by running, in reverse order, the inverse of each transformation in the definition 
of the Carry operator. 

6.4.4 Modular Addition 

The following program defines modular addition for «-bit binary numbers a and b. 

AddMod \a)\b)\M) -> \a)\(b + a) mod M)\M), 

where the registers a and M have n qubits and b is an n + 1-qubit register. When the highest order 
bit, b„, of register b is initially 0, the transformation AddMod replaces the contents of register 
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b with b + a mod M , where M is the contents of register M. The contents of registers a and M 
(and the temporaries c and t) are unchanged by AddMod. The construction makes use of the 
Add transformation we defined in the previous section. 


define AddMod \a[n])\b[n + 

l])l M[n]) = 


qubit t 

use a temporary bit 

(1) 

qubit c[n] 

storage for the n -bit carry 

(2) 

Add \c)\a)\b) 

add a to b 

(3) 

Add~ 1 \c)\M)\b) 

subtract M from b 

(4) 

\b n ) control X\t) 

toggle t when underflow 

(5) 

f> control Add \c)\M)\b) 

when underflow, add M back to b 

(6) 

Add~ 1 \c)\a)\b) 

subtract a again 

(7) 

-•\b n ) control X\t) 

reset t 

(8) 

Add \c)\a)\b) 

construct final result 

(9) 

Classically, steps (3) through 

(6) are all that are needed. In (4) if 

M > b, subtracting M from h 


causes b n to become 1. Steps (7) through (9) are needed to reset t. Note that each Add operation 
internally resets |c) back to its original value. 

The condition 0 < a, b < M is necessary, since for values outside that range, an operation that 
sends \a, h. M) to | a, b + a mod M, M) is not reversible and therefore not unitary. If this condition 
does not hold, for example if b > M initially, then the final value of b may still be greater than 
M, since the algorithm subtracts M at most once. 

6.4.5 Modular Multiplication 

The TimesMod transformation multiplies two n-bit binary numbers a and b modulo another n-bit 
binary number M. The transformation 

TimesMod \a)\b)\M)\p) -> \a)\b)\M)\(p + ba) mod M) 

is defined by the following program that successively adds mod M to the result register p. 
It is assumed that a < M, but b can be arbitrary. Both a and p are (n + l)-qubit registers; the 
additional high-order bit is needed for intermediate results. The operation Shift simply cyclically 
shifts all bits by 1, which can easily be done by swapping bits a ,-+1 with a,- for all i, starting 
with the high-order bits. Shift acts as multiplication by 2, since the high-order bit of a will 


beO. 

define TimesMod \a[n + Vl)\b\k])\M[ri\)\p[n + 1]} = 


qubit t[k\ 

use k temporary bits 

(1) 

qubit c[n ] 

carry register for addition 

(2) 

for i e [0 ... k — 1] 

iterate through bits of b 

(3) 

Add~ l \c)\M)\a) 

subtract M from a 

(4) 

\a„) control X\ f,} 

tj = 1 if M > a 

(5) 

\tj) control Add \c)\M)\a) 

add M to a if f, is set 

(6) 
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| bj) control 


AddMod \a„-i 

■■■a 0 )\p)\M) 

add a to p if bj is set 

(7) 

Shift |a> 


multiply a by 2 

(8) 

(Ti 

1 

O 


clear t and restore a 

(9) 

Shift- 1 \a) 


divide a by 2 

(10) 

\tj) control Add- 1 

1 \c)\M)\a) 

perform all steps in reverse 

(ID 

\a„) control X\tf 


clear ith bit of t 

(12) 

Add \c)\M)\a) 


add M to a 

(13) 


Lines (4)-(6) compute the a mod M. The second loop, (9)—(13), undoes all the steps of the first 
one, (3)—( 8 ), except the conditional addition to the output p (line 8 ). 

Note that modular multiplication cannot be defined as an in-place operation because the trans¬ 
formation that sends |n, b , M) to \a, ai mod M, M) is not unitary: both |2, 1,4) and |2, 3, 4) 
would be mapped to the same state |2, 2, 4). 

6.4.6 Modular Exponentiation 

We implement modular exponentiation, 

ExpMod \a)\b)\M)\0) -» \a)\b)\M)\a h mod M) 

using 0(n 2 ) temporary qubits where a , b, and M are n-qubit registers. 

First, we define two transformations we will use in our implementation of ExpMod , an y;-hit 
copy and an / 7 -bit modular squaring function. The Copy transformation 

Copy: \a)\b) —> \a)\a®b) 

copies the contents of an n-bit register a to another //-bit register b whenever the register b is 
initialized to 0. The operation Copy can be implemented as bitwise xor operations between the 
corresponding bits in registers a and b. 

define Copy \a[ri\)\b[ri\) = 

for i e [ 0 ..n — 1] bit-wise 

|a,-) control X |/;,} xor a with b 

The modular squaring operation SquareMod 

SquareMod : |a)|M)|s) —>■ |a)|M>|(s + a 2 ) mod M) 

places the result of squaring the contents of register a, modulo the contents of register M, in the 
register s. 


define SquareMod \a[n + 1]) \M[n]) |s[n + 1]) = 


qubit t[n] 


use n temporary bits 

(1) 

Copy \a„-i 

■ ■■a 0 )\t) 

copy n bits of a to t 

(2) 

TimesMod 

\a)\t)\M)\s) 

compute a 2 mod M. 

(3) 

Copy- 1 | a n 

-1 ■ ■ -«o>k> 

clear t 

(4) 
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Finally, we can give a recursive definition of modular exponentiation with the signature 
ExpMod : \a)\b)\M)\p)\e) -* \a)\b)\M)\p)\e © (pa b ) mod M) 
define ExpMod \a[n + V\)\b[Y\)\M[ri\)\p[n + l])|e[« + 1]} = base case 


->bo) control Copy \p) \e) 

result is p 

(1) 

bo) control TimesMod \a)\p)\M)\e) 

result is pa 1 mod M 

(2) 

jfine ExpMod \a[n + \])\b[k])\M[n])\e[n 
qubit u[n + 1] 

+ 1]) = general case k > 1 

for a 1 mod M 

(3) 

qubit v[n + 1] 

for (p*a b °)modM 

(4) 

->bo) control Copy \p) u) 

v = pa° mod M 

(5) 

bo) control TimesMod \a)\p)\M)\e) 

e — pa 1 mod M 

(6) 

SquareMod |fl)|M)|w) 

compute a 2 mod M in u 

(7) 

ExpMod \u)\bk-i ■ ■ ■ bi)\M)\v)\e) 

compute v(a 2 ) b / 2 mod M 

(8) 

SquareMod~ l \a)\M)\u) 

uncompute u 

(9) 

bo) control TimesMod -1 \a)\p)\M)\e) 

uncompute e 

(10) 

-' bo) control Copy _1 | p) i>) 

uncompute v 

(ID 


The program unfolds recursively k times, once for each bit of b. Steps (5)—(8) and the base case 
(1) and (2) perform the classical computation. The division b/2 in step (8) is integer division. 
Each recursive step requires two temporary registers of size n + 1 that are reset at the end in steps 
(9) and (11). Thus, the algorithm requires a total of 2 (k — 1 )(n + 1) temporary qubits. 

The algorithm for modular multiplication given in 6.4.5 requires 0{n 2 ) steps to multiply 
two 77 -bit numbers. Thus, the modular exponentiation requires 0(kn 2 ) steps. But more efficient 
multiplication algorithms are possible and this complexity can be reduced to O (kn log n log log n) 
using the Schonhage-Strassen multiplication algorithm. 

6.5 References 

See Feynman’s Lectures on Computation [121] for an account of reversible computation and its 
relation to the energy of computation and information. 

In his 1980 paper [270], Tommaso Toffoli shows that any (classical) function with finite domain 
and range can be realized as a reversible function using additional bits. To prove this theorem, 
he introduces a family of controlled gates 0 (n) that we write as /\ n _ l X. The instance /\ 2 X is 
generally known as the Toffoli gate. The Fredkin gate was first described as a billiard-ball gate 
in [124], 

Reversible classical computations were first discussed by Bennett in [39], where he constructs 
reversible Turing machines from nonreversible ones. In [40] Bennett discusses the recursive 
decomposition presented in section 6.2. Bennett’s argument uses multitape Turing machines 
instead of registers. 
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Deutsch [99] shows how to construct reversible quantum gates for any classically computable 
function. Deutsch defines, and Yao [287] and Bernstein and Vazirani [48] refine, the definition of 
a universal quantum Turing machine. This construction assumes a sufficient supply of qubits that 
correspond to the finite but unbounded tape of a Turing machine. Section 7.2 discusses quantum 
Turing machines briefly. 

The implementations of the m -way and and Conditional(z, Q) are due to Barenco et al. [31], 
who also describe a O (/? 2 )-gate circuit for Conditional(z, Q) that uses no additional qubits. Vedral, 
Barenco, and Ekert [275] give a comprehensive definition of quantum circuits for arithmetic 
operations. In particular, they show how modular exponentiation a x mod M can be done with 
fewer temporary qubits than the version presented here for the case where a and M are classical 
and relative prime. Fast multiplication was first described in Schonhage and Strassen’s paper 
[245]. Descriptions in English can be found in most books on algorithms such as [182]. 

6.6 Exercises 

Exercise 6.1 . Show that it is impossible to perform a reversible and operation with only two bits. 

Exercise 6.2. 

a. Construct a classical Boolean circuit with three input bits and two output bits that computes 
as a two-bit binary number the number of 1 bits in the input. 

b. Convert your circuit into a classical reversible one. 

c. Give an equivalent quantum circuit. 

Exercise 6.3. Given two-qubit registers |c> and | a) and three-qubit register \b), construct the 
quantum circuit that computes Add |c) |fl> | b). 

Exercise 6.4. 

a. Define a quantum algorithm that computes the maximum of two n-qubit registers. 

b. Explain why such an algorithm requires one additional qubit that cannot be reused, that is, the 
algorithm will have to have 2 n + 1 input and output qubits. 

Exercise 6.5. Show how to construct an efficient reversible circuit for every classical circuit along 
the lines of the construction of section 6.2.2, but without the assumption that t is a power of 2. 
Give the time and space bounds for your construction. 
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Introduction to Quantum Algorithms 


The previous chapter used quantum computers in an essentially classical manner; in each of the 
algorithms of part I, if the quantum computer starts in a standard basis state, the state after every 
step of the computation is also a standard basis vector, not a superposition, so the computational 
state always has an obvious interpretation as a classical state. These algorithms do not make 
use of the ability of qubits to be in superposition or of sets of qubits to be entangled. In part I, 
we showed that quantum computation is at least as powerful as classical computation: for any 
classical circuit, there exists a quantum circuit that performs the same computation with similar 
efficiency. We now turn our attention to showing that quantum computation is more powerful than 
classical computation. Part II is concerned with truly quantum algorithms, quantum computations 
that outperform classical ones. 

The algorithms in this part make use of the simple gates used in the quantum analogs of 
classical computations of chapter 6, and they also use more general unitary transformations that 
have no classical counterpart. Geometrically, all quantum state transformations on n qubits are 
rotations of 2"-dimensional complex state space. Nonclassical quantum computations involve 
rotations to nonstandard bases, whereas, as explained in section 6.1, the steps of any classical 
computation merely permute the standard basis elements. Section 5.4.4 showed how any quantum 
transformation can be implemented in terms of simple gates. We now concentrate on quantum 
transformations that can be implemented efficiently and how such transformations can be used to 
speed up certain types of computation. The key to designing a truly quantum algorithm is figuring 
out how to use these nonclassical basic unitary gates to perform a computation more efficiently. 

In this and the next few chapters, all discussion is in terms of the standard circuit model of 
quantum computation we described in section 5.6. We use the language introduced in 6.3 to specify 
general sequences of simple quantum gates as we did when we discussed quantum analogs of 
classical computations in chapter 6, but now we allow basic unitary transformations that have 
no classical counterpart. The way efficiency is computed in the quantum circuit model resembles 
the way it is computed classically, which makes it easy to compare the efficiency of quantum and 
classical algorithms. Early quantum algorithms were designed in the circuit model, but it is not 
the only, or necessarily the best, model to use for quantum algorithm design. Other models of 
quantum computation exist, and algorithms in these models have a different flavor. In chapter 13 
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we describe alternative models that have been shown to be equivalent in terms of computational 
power to the standard circuit model of quantum computation. In addition to having led to new types 
of quantum algorithms, these models underlie some promising efforts to build quantum computers. 

In the standard circuit model of quantum computation, the efficiency of a quantum algorithm is 
computed in terms of the circuit complexity, the number of basic gates together with the number 
of qubits used, of the circuits used to implement the algorithm. Sometimes we are interested in the 
efficient use of other resources, so we will measure, say, the number of bits or qubits transmitted 
between two parties to carry out a task, or the number of calls to a (usually expensive to compute) 
function. Such functions are often called black box or oracle functions, since it is assumed that 
one does not have access to the inner workings of the computation of this function, only to the 
result of its application. These various notions of complexity are discussed in section 7.2. 

Section 7.1 begins the chapter with a general discussion of computing with superpositions, 
including the notion of quantum parallelism. Section 7.2 describes various notions of complexity 
including circuit complexity, query complexity, and communication complexity. Deutsch’s algo¬ 
rithm of section 7.3.1 provides the first example of a truly quantum algorithm, one for which there 
is no classical analog. The quantum subroutines of section 7.4 pave the way for the description 
in section 7.5 of four simple quantum algorithms, including Simon’s algorithm, which inspired 
Shor’s factoring algorithm. While the problems these algorithms solve are not so interesting, 
a study of the techniques they use will aid in understanding Grover’s algorithm and Shor’s 
algorithm. Section 7.7 defines quantum complexity and describes relations between quantum 
complexity classes and classical complexity classes. The final section of the chapter, section 7.8, 
discusses quantum Fourier transforms, which, in one form or another, are used in most of the 
algorithms described in this book. 

7.1 Computing with Superpositions 

Many quantum algorithms use quantum analogs of classical computation as at least part of their 
computation. Quantum algorithms often start by creating a quantum superposition and then feed¬ 
ing it into a quantum version Uf of a classical circuit that computes a function /. This setup, called 
quantum parallelism, accomplishes nothing by itself—any algorithm that stopped at this point 
would have no advantage over a classical algorithm—but this construction leaves the system in a 
state that quantum algorithm designers have found a useful starting point. Both Shor’s algorithm 
and Grover’s algorithm begin with the quantum parallelism setup. 

7.1.1 The Walsh-Hadamard Transformation 

Quantum parallelism, the first step of many quantum algorithms, starts by using the Walsh- 
Hadamard transformation, a generalization of the Hadamard transformation, to create a super¬ 
position of all input values. Recall from section 5.2.2 that the Hadamard transformation H applied 
to |0) creates a superposition state 4= (|0) + |1}). Applied to n qubits individually, all in state |0), 
H generates a superposition of all 2" standard basis vectors, which can be viewed as the binary 
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representation of the numbers from 0 to 2" — 1: 

| 00 ... 0 ) 


-^=((|0> + |1»®(|0) + |1»®.. 
V2" 

■■®(|0) + |1») 

-L(|0...00} + |0...01} + |0.. 

V2" 

.10} + ...+ 11...11)) 

2" —1 

vP ^ |x> ' 



Box 7.1 

Hamming Weight and Hamming Distance 


The Hamming distance d}j (x, y) between two bit strings x and y is the number of bits in which the two 
strings differ. The Hamming weight d[[ (x) of a bit string x is the number of 1-bits in x, which is equal 
to the Hamming distance between x and the bit string consisting of all zeros: dfj(x) = dfj(x, 0). 

For two bit strings x and y, x ■ y is the number of common 1 bits in x and y, x ® y is the bitwise 
exclusive-OR, and x A y is the bitwise and of x and y. The bitwise exclusive-OR © can also be viewed 
as bitwise modular addition of the strings x and y, viewed as elements of Z!J. We use ->x to denote 
the bit string that flips 0 and 1 throughout bit string x, so ->x = x © 11 ... 1. 

The following identities hold: 

x ■ y = d H (x A y) 

(x-vmod2)= 1(1 —(—I)*'-'-) 

x-y + x-z —2 x ■ (y © z) 
d H (x © y) =2 d H (x) + d H (y) 

where the notation x =2 y means equality modulo 2; it is shorthand for x mod 2 = y mod 2. Note 
that 

2"—1 

£(-l)** =0 

x=0 


since the successive (2 i and 2/ + 1) terms cancel. 
Finally, we note that 


2"—1 

£<-D x '- v = 


x=0 


2" if y = 0 
0 otherwise. 


(7.1) 
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The transformation W — H ® H <g> • • • <g> H , which applies H to each of the qubits in an n-qubit 
state, is called the Walsh, or Walsh-Hadamard, transformation. Using N = 2", we may write 

x=0 

Another way of writing W is useful for understanding the effect of W in quantum algorithms. In 
the standard basis, the matrix for the n -qubit Walsh-Hadamard transformation is a 2" x 2" matrix 
W with entries W rs , such that 

W„ = W„ = -L(-iy s , 

V2" 

where r ■ s is the number of common one-bits in s and r (see box 7.1) and both r and s range from 
0 to 2" — 1. To see this equality, note that 

W(|r» = £w„|s). 

S 

Let r„_i ... ro be the binary representation of r, and s n _ | ... ,vo be the binary representation of s. 


W (|r)) = (H ® ■ ■ ■ <g> H)(|r„_i) <g> • • • ® |r 0 » 

= -4=(|0> + (-l) r -‘|l» <s> • • • ® (|0> + (-1)'°|1» 

V2” 

^ 2 n —1 

= -7= V(-l)*- ir - 1 |i II _i)®...®(-l)'^| 5 o) 

s=0 



2 " — 1 


E(-ri4 

s=0 


7.1.2 Quantum Parallelism 

Any transformation of the form Uf — \x, y) —>• \x, y © f(x)) from section 6.1 is linear and 
therefore acts on a superposition ^ a x \x) of input values as follows: 

Uf : 1>I*.0) -> ^2°x\x, f(x)}. 

X X 

Consider the effect of applying Uf to the superposition of values from 0 to 2" — 1 obtained from 
the Walsh transformation: 

j N- 1 j N -1 

Uf : (W|0»® |0> = — |x)|0> Wl fix)). 

x=0 v™ x= o 
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After only one application of Uf, the superposition now contains all of the 2" function values 
fix) entangled with their corresponding input value x. This effect is called quantum parallelism. 
Since n qubits enable us to work simultaneously with 2" values, quantum parallelism in some 
sense circumvents the time/space trade-off of classical parallelism through its ability to hold 
exponentially many computed values in a linear amount of physical space. However, this effect 
is less powerful than it may initially appear. 

To begin with, it is possible to gain only limited information from this superposition: these 2" 
values of / are not independently accessible. We can gain information only from measuring the 
states, but measuring in the standard basis will project the final state onto a single input /output 
pair \x, fix)), and a random one at that. The following simple example uses the basic setup of 
quantum parallelism and illustrates how useless the raw superposition arising from quantum 
parallelism is on its own, without performing any additional transformations. 


Example 7.1.1 The controlled-controlled-NOT (Toffoli) gate, T, of section 5.4.3 computes the 
conjunction of two values: 


|x) 

\y) 

loj 


++ 


|x) 

j*> 

\x Ay) 


Take as input the superposition of all possible bit combinations of x and y together with a single¬ 
qubit register, initially set to |0>, to contain the output. We use quantum parallelism to construct 
this input state in the standard way: 

W (|00» ® |0) = -4(|0> + |1»®4(| 0 > + | 1 »®|0> 

V2 V2 

= ^(|000> + |010> + |100) + |110». 

Applying the Toffoli gate T to this superposition of inputs yields 

7TW|00) <g> |0» = ^(|000)+ |010)+ |100)+ |111}). 

This superposition can be viewed as a truth table for conjunction. The values of x, y, and x Ay 
are entangled in such a way that measuring in the standard basis will give one line of the truth 
table. Computing the and using quantum parallelism, and then measuring in the standard basis, 
gives no advantage over classical parallelism: only one result is obtained and, worse still, we 
cannot even choose which result we get. 
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7.2 Notions of Complexity 

Complexity theory analyzes the amount of resources, most often time or space, asymptotically 
required to perform a computation. Turing machines provide a formal model of computation 
often used for reasoning about computational complexity. Early work by Benioff, improved by 
David Deutsch, then Andrew Yao, then Ethan Bernstein and Umesh Vazirani, defined quan¬ 
tum Turing machines and enabled the formalization of quantum complexity and comparison 
with classical results. In both quantum and classical settings, other methods, such as the 
circuit model, provide alternative means for formalizing complexity notions. Because most 
research on quantum algorithms discusses complexity in terms of quantum circuit complexity, 
we have chosen to take that approach in this book. Another common complexity measure used 
in the analysis of quantum algorithms is quantum query complexity, which will be discussed 
in section 7.2.1. Furthermore, there are a number of complexity measures used for analyzing 
quantum communication protocols. Communication complexity will be discussed in section 
7.2.2. 

A circuit family C = {C„} consists of circuits C n indexed by the maximum input size for that 
circuit; the circuit C„ handles input of size n (bits or qubits). The complexity of a circuit C 
is defined to be the number of simple gates in the circuit, where the set of simple gates under 
consideration must be specified. Any of the finite sets of gates discussed in section 5.5 may be used, 
or the infinite set consisting of all single qubit operations together with the C, w1 may be used. The 
circuit complexity, or time complexity, of a family of circuits C = {C„} is the asymptotic number 
of simple gates in the circuits expressed as a function of the input size; the circuit complexity 
for a circuit family C = {C„} is 0( f(n)) if the size of the circuit is bounded by (?(/(«)): the 
function t(n) — \ C„\ satisfies t(n) e O (f{n)). Any of the simple gatesets mentioned earlier give 
the same asymptotic circuit complexity. 

Circuit complexity models are nonuniform in that different, larger circuits are required to 
handle larger input sizes. Both quantum and classical Turing machines, by contrast, propose 
a single machine that can handle arbitrarily large input. The nonuniformity of circuit models 
makes circuit complexity more complicated to define than Turing machine models because of 
the following issue: complexity can be hidden in the complexity of constructing the circuits C„ 
themselves, even if the size of the circuits C n is asymptotically bounded. To get sensible notions 
of complexity, in particular to obtain circuit complexity measures similar to Turing machine 
based ones, a separate uniformity condition must be imposed. Both quantum and classical circuit 
complexity use similar uniformity conditions. 

In addition to uniformity, a requirement that the behavior of the circuits C„ in a circuit family 
C behave in a consistent manner is usually imposed as well. This consistency condition is usually 
phrased in terms of a function g(x), and says that all circuits C n e C that can take x as input give 
g(x) as output. This condition is sometimes misunderstood to include restrictions on the sorts of 
functions g(x) a consistent circuit family can compute. For this reason, and to generalize easily 
to the quantum case, we phrase this same consistency condition without explicit reference to a 
function g(x). 
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Consistency Condition A quantum or classical circuit family C is consistent if its circuits C„ give 
consistent results: for all m < n: applying circuit C„ to input x of size m must give the same re¬ 
sult as applying C m to that input. 

The most common uniformity condition, and the one we impose here, is the polynomial 
uniformity condition. 

Uniformity condition A quantum or classical circuit family C — {C„ ) is polynomially uniform if 
there exists a polynomial-time classical algorithm that generates the circuits. In other words, C 
is polynomially uniform if there exists a polynomial f(n) and a classical program that, given n, 
constructs the circuit C n in at most 0(f(n)) steps. 

The uniformity condition means that the circuit construction cannot be arbitrarily complex. 

The relation between the circuit complexity of polynomially uniform, consistent circuit fam¬ 
ilies and the Turing machine complexity is understood for both the classical and quantum case. 
In the classical case, for any classical function g(x) computable on a Turing machine in time 
0( f in)), there is a polynomially uniform, consistent classical circuit family that computes g(x) 
in time 0(f(n) log/(«)). Conversely, a polynomially uniform, consistent family of Boolean 
circuits can be simulated efficiently by a Turing machine. In the quantum case, Yao has shown 
that any polynomial time computation on a quantum Turing machine can be computed by a poly¬ 
nomially uniform, consistent family of polynomially sized quantum circuits. As in the classical 
case, demonstrating that any polynomially uniform, consistent family of quantum circuit can be 
simulated by a quantum Turing machine is straightforward. Since we are not concerned with 
sublinear complexity differences, asymptotic differences of at most a polynomial in log(/(n), 
we discuss quantum complexity in terms of circuit complexity with the polynomial uniformity 
condition instead of using quantum Turing machines. 

7.2.1 Query Complexity 

The earliest quantum algorithms solve black box, or oracle, problems. A classical black box out¬ 
puts fix) upon input of x. A quantum black box behaves like Uf, outputting a x \x, fix) © y) 
upon input of a x \x)\y). Black boxes are theoretical constructs; they may or may not have an 
efficient implementation. For this reason, they are often called oracles. The black box terminology 
emphasizes that only the output of a black box can be used to solve the problem, not anything 
about its implementation or any of the intermediate values computed along the way; we cannot 
see inside it. The most common type of complexity discussed with respect to black box problems 
is query complexity, how many calls to the oracle are required to solve the problem. 

Black box algorithms of low query complexity, algorithms that solve a black box problem with 
few calls to the oracle, are only of practical use if the black box has an efficient implementation. 
The black box approach is very useful, however, in establishing lower bounds on the circuit 
complexity of a problem. If the query complexity is f2(A0—in other words, at least Q(N) calls 
to the oracle are required—then the circuit complexity must be at least £2(N). 
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Black boxes have been used to establish lower bounds on the circuit complexity for quantum 
algorithms, but their first use in quantum computation was to show that the quantum query 
complexity of certain black box problems was strictly less than the classical query complexity: 
the number of calls to a quantum oracle needed to solve certain problems is strictly less than the 
required number of calls to a classical oracle to solve the same problem. 

The first few quantum algorithms solve black box problems: Deutsch’s problem (section 7.3.1), 
the Deutsch-Jozsa problem (section 7.5.1), the Bernstein-Vazirani problem (section 7.5.2), and 
Simon’s problem (section 7.5.3). The most famous query complexity result is Grover’s: that it 
takes only O (VaO calls to a quantum black box to solve an unstructured search problem over 
N elements, where as the classical query complexity of unstructured search is Q(N). Grover’s 
algorithm, and the extent to which its superior query complexity provide practical benefit, are 
discussed in chapter 9. 

7.2.2 Communication Complexity 

For communication protocols, common complexity measures include the minimum number of 
bits, or the minimum number of qubits, that must be transmitted to accomplish a task. Bounds on 
other resources, such as the number of bits of shared randomness or, in the quantum case, the num¬ 
ber of shared EPR pairs, may or may not be of interest as well. Various notions of communication 
complexities exist, depending on whether the task requires quantum or classical information to 
be transmitted, whether qubits or bits can be sent, and what entanglement resources can be used. 

We have already seen some examples of communication complexity results. The complexity 
notion of interest in dense coding is the number of qubits that must be sent in order to communicate 
n bits of information. While classical protocols require the transmission of n bits, only n /2 qubits 
need to be sent in order to communicate n bits of information. The other resource used in dense 
coding, the number of EPR pairs, sometimes called ebits in the communication protocol context, 
required in the setup is also n /2. Teleportation, by contrast, aims to transmit quantum information 
using a classical channel that can only send bits not qubits. The relevant complexity notion is 
the number of bits needed to transmit n qubits worth of quantum information. Using quantum 
teleportation, 2 n bits can be used to transmit the state of n qubits. The number of ebits used to 
teleport n qubits is n. 

The distributed computation protocol described in section 7.5.4 does not require the trans¬ 
mission of any bits or qubits, but it requires n ebits in order to accomplish a task concerning 
exponentially large bit strings, bit strings of length N — 2". A classical solution to this problem 
requires a minimum of N /2 bits to be transmitted. Since this book is concerned primarily with 
quantum computation not quantum communication, we will not discuss quantum communication 
complexity again except briefly in section 13.5. 

7.3 A Simple Quantum Algorithm 

We are now in a position to describe our first truly quantum algorithm. This algorithm, due 
to David Deutsch in 1985, was the first result that showed that quantum computation could 
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outperform classical computation. The problem Deutsch’s algorithm solves is a black box prob¬ 
lem. Deutsch showed that his quantum algorithm has better query complexity than any possible 
classical algorithm: it can solve the problem with fewer calls to the black box than is possible clas¬ 
sically. While the problem it solves is too trivial to be of practical interest, the algorithm contains 
simple versions of a number of key elements of intrinsically quantum computation, including the 
use of nonstandard bases and quantum analogs of classical functions applied to superpositions, 
that will recur in more complex quantum algorithms. 

7.3.1 Deutsch's Problem 

Deutsch's Problem Given a Boolean function / : Z 2 —> Z 2 , determine whether / is constant. 

Deutsch’s quantum algorithm, described in this section, requires only a single call to a black 
box for Uf to solve the problem. Any classical algorithm requires two calls to a classical black 
box for Cf, one for each input value. The key to Deutsch’s algorithm is the nonclassical ability to 
place the second qubit of the input to the black box in a superposition. The subroutine of section 

7.4.2 generalizes this trick. 

Recall from 6.1 that Uf for a single bit function / takes two qubits of input and produces 
two qubits of output. On input |x)|y), Uf produces \x) \f(x) © y), so when |y) = |0>, the result 
of applying Uf is \x)\f(x)). The algorithm applies Uf to the two-qubit state |+)|— >, where the 
first qubit is a superposition of the two values in the domain of /, and the third qubit is in the 
superposition |—) = -4=(|0> — 11>). We obtain 

v 2 

Uf (|+)|-» = u f Q(|0) + |1»(|0> -11») 

= \ (| 0 )(| 0 © /( 0 )> -11 0 /( 0 )» + | 1 >(|O 0 /( 1 )> -11 0 /( 1 )»). 

In other words, 

1 

U f (|+)|-» = - £ |*X|O 0 f(x)) - |1 0 f(x))). 

Z x=0 

When fix) = 0, ^(|O0/(x)> - |1 ©/(*)» becomes ^(|0> - |1» = |->. When fix) = 1, 
^(|O0 fix)) - |1 0/(x)» becomes J=(|l) - |0» = -|-). Therefore 

For / constant, (— l)^ w is just a physically meaningless global phase, so the state is simply 
|+)|—>. For / not constant, the term (— l)^ w negates exactly one of the terms in the superposition 
so, up to a global phase, the state is |—)|—>. If we apply the Hadamard transformation H to the 
first qubit and then measure it, with certainty we obtain | 0 > in the first case and 11 } in the second 
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case. Thus with a single call to Uf we can determine, with certainty, whether / is constant 
or not. We now have our first example of a quantum algorithm that outperforms any classical 
algorithm! 

It may surprise readers that this algorithm succeeds with certainty; the most commonly remem¬ 
bered aspect of quantum mechanics is its probabilistic nature, so people often naively expect that 
anything done with quantum means must be probabilistic, or at least that anything that exhibits 
peculiarly quantum properties must be probabilistic. We already know, from our study of quan¬ 
tum analogs to classical computations, that the first of these expectations does not hold. The 
algorithm for Deutsch’s problem shows that even inherently quantum processes do not have to 
be probabilistic. 

7.4 Quantum Subroutines 

We now look at some useful nonclassical operations that can be performed on a quantum compu¬ 
ter. The first subroutine, discussed in section 7.4.2, is commonly used; in particular, it is part of 
Grover’s search algorithm as well as being used in most of the simpler quantum algorithms 
of section 7.5, including the Deutsch-Jozsa problem, a multiple bit generalization of Deutsch’s 
problem. To illustrate further how to work with quantum superpositions, we describe a couple of 
other subroutines, though these subroutines are not used elsewhere in the book. 

7.4.1 The Importance of Unentangling Temporary Qubits in Quantum Subroutines 

Chapter 6, when describing the constructions of section 6.2, emphasized the importance of uncom¬ 
puting temporarily used bits to conserve space in classical computations. In quantum computation, 
uncomputing qubits used temporarily as part of subroutines is crucial even when conserving space 
and reusing qubits is not an issue; failing to uncompute temporary qubits can result in entanglement 
between the computational qubits and the temporary qubits, and in this way can destroy the calcula¬ 
tion. More specifically, if a subroutine claims to compute state JT a,- x,), it is not okay if it actually 
computes JT a, |x',}| v/) and throws away the qubits storing | v,} unless there is no entanglement 
between the two registers. There is no entanglement if a,- |x,-> | v,} = a,- |x,}) ® |y,>, which 

can happen only if | v,-> = | yj) for all i and j. In general, the states cqlx,) and JT a, |x,>|y/} 
behave quite differently, even if we have access only to the first register of the second state. 
Chapter 10, which discusses quantum subsystems, provides the means of talking about the dif¬ 
ferences between these two situations without looking at the consequences for computation. In 
this section, we illustrate the difference by showing how using the first state when expecting the 
second can mess up computation. In particular, we show that if we replace the black box Uf used 
in Deutsch’s problem with the black box for Vf that outputs 

Vf : \x, t, y) -> \x , t@x, y © /(*)), 

Deutsch’s algorithm no longer works. 



7.4 Quantum Subroutines 


135 


Begin with qubit 1 1) in the state |0) and, as before, the first qubit in the state |+}, and the third 
qubit in the state |—}. Apply Vf to obtain 


V/(|+)|0)|-» = Vf 





x=0 


l 1 

-=E(-D ,w i*)i*)i->- 

x=0 


The first qubit is now entangled with the second qubit. Because of this entanglement, applying 
H to the first qubit and then measuring it no longer has the desired effect. For example, when / 
is constant, the state is (|00) + 111)) |—), and applying H ® I ® I results in the state 

*(| 00 > + | 10 > + | 01 >-| 11 »|—>. 

The second and fourth terms canceled before, but they do so no longer. Now there is an equal 
chance of measuring the first qubit as 10) or 11). A similar calculation shows that when the function 
is not constant, there is also an equal chance of measuring the first qubit as |0) or 11). Thus, we can 
no longer distinguish the two cases. Entanglement with the qubit 1 1) has destroyed the quantum 
computation. 

Had Vf properly uncomputed t so that at the end of the calculation it was in state |0), the 
algorithm would still work properly. For example, for / constant, we would have state 

^(| 00 ) + | 10 ) + | 00 ) — | 10 »|—} 

in which case the appropriate terms would cancel to yield 


(|00»|->. 


If a quantum subroutine claims to produce a state |i/r), it must not produce a state that looks like 
| \[r) but is entangled with other qubits. In particular, if a subroutine makes use of other qubits, 
by the end of the subroutine these qubits must not be entangled with the other qubits. For this 
reason, the following quantum subroutines are careful to uncompute any auxiliary qubits so that 
at the end of the algorithm they are always in state |0). 


7.4.2 Phase Change for a Subset of Basis Vectors 

Aim Change the phase of terms in a superposition | i/r) = a i\i) depending on whether i is in a 

subset X of {0, 1 .IV — 1} or not. More specifically, we wish to find an efficient implementation 

of the quantum transformation 

N -1 

sx ■ W ^a.e'^lx) + E fl - f k). 

x=0 xeX x£X 

Section 5.4 explained how to realize an arbitrary unitary transformation without regard to 
efficiency. Applying that algorithm blindly would give an implementation of using more 
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than N — 2" simple gates. This section shows how, for any efficiently computable subset X, the 
transformation S x can be implemented efficiently. An efficiently implementable is used in 
some of the quantum algorithms we describe later that outperform classical ones. 

We can hope to implement S x efficiently only if there is an efficient algorithm for computing 
membership in X: the Boolean function / : Z 2 « —> Z 2 , where 


fix) = 


1 if x e X 
0 otherwise 


must be efficiently computable, say polynomial in n . Most subsets X do not have this property. 
For subsets X with this property, the main result of chapter 6 implies that there is an efficient 
quantum circuit for Uf. Given such an implementation for Uf, we can compute S x using a few 
additional steps. We use Uf to compute / in a temporary qubit, use the value in that qubit to 
effect the phase change, and then uncompute / in order to remove any entanglement between the 
temporary qubit and the rest of the state. 


define Phase f(<p)\x[k]} — 


qubit a[l] 

a temporary bit 

(1) 

U f \x, a) 

compute f in a 

(2) 

K(4>/2)\a) 


(3) 

T{-<t>/2)\a) 


(4) 

Uj x \x, a) 

uncompute / 

(5) 


Since 


t(—(/)/2)k(<p/2 )= (j ) , 


where K and T are the single-qubit operations introduced in 5.4.1, together steps (3) and (4) shift 
the phase by e'^ if and only if bit a is one. Strictly speaking, we do not need to do step (3) at all 
since it is a physically meaningless global phase shift: performing step (3) merely makes it easier 
to see that we get the desired result. Alternatively, we could replace steps (3) and (4) by a single 
step /\j K (</>) \a)\Xi), where i can be any of the qubits in register x, since placing a phase in any 
term of the tensor product is the same as placing it in any other term. We need to uncompute Uf 
in step (5) to remove the entanglement between register \x) and the temporary qubit so that \x) 
ends up in the desired state, no longer entangled with the temporary qubits. 


Special case 4> = jt The important special case (f> = jt has an alternative, surprisingly simple 
implementation that generalizes the trick used in the algorithm for Deutsch’s problem. Given Uf 
as above, the transformation Sf can be implemented by initializing a temporary qubit b to |—} = 
—^(10) — 11)), and then using Uf to compute into this register: consider \ f) = fZ x€X a x\ x ) + 
f2 x< f X a x\x), and compute 
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= GW»®|-). 


In particular, the following circuit, acting on the n -qubit state |0) together with an ancilla qubit 
in state |1) creates the superposition \xjfx) — 1)^*^): 


0 } 


1 ) 



-) 


For elegance, and to be able to reuse the ancilla qubit, we may want to apply a final Hadamard 
transformation to the ancilla qubit, in which case the circuit is 



Geometrically, when acting on the /V-dimensional vector space associated with the quantum 
system, the transformation is a reflection about the N — ^-dimensional hyperplane perpendic¬ 
ular to the /.'-dimensional hyperplane spanned by{|x}|xeX}:a reflection in a hyperplane sends 
any vector | v) perpendicular to the hyperplane to its negative — | v ). For any unitary transformation 
U, the transformation 

USxU~ l 

is a reflection in the hyperplane perpendicular to the hyperplane spanned by the vectors {U\x)\x e 
X}. Section 9.2.1 uses this geometric view of S% to build intuition for Grover’s algorithm. 

We can write the result of applying .S'J to the superposition VV10} as 
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/(*) I 


where / is the Boolean function for membership in X, 
fix ) = 

Conversely, given a Boolean function /, we define SJ to be .S'J where X — \x\f(x) = 1}. 


1 if x e X 
0 otherwise. 


7.4.3 State-Dependent Phase Shifts 

Section 7.4.2 explained how to change efficiently the phase of all terms in a superposition corre¬ 
sponding to certain subsets of basis elements, but that construction performs the same phase change 
on all of those terms. This section considers the problem of implementing different phase shifts 
in different terms; we wish to implement transformations in which the amount of the phase shift 
depends on the quantum state. 


Aim Efficiently approximate to accuracy .v the transformation on «-qubits that changes the phase 
of the basis elements by 

\x) —> e l<l>(x fx) 


where the function fix) that describes the desired phase shift angle cp for each term x has an 
associated function / : Z„ —» Z v that is efficiently computable, and the value of the / th bit of 
f(x) is the /th term in the following binary expansion for fix): 


fix) 


2k 


fix) 

2 s ' 


The implementation can be only as efficient as the function /. Given a quantum circuit that 
efficiently implements {//, we can perform the state-dependent phase shift in Ois) steps in addition 
to 2 uses of Uf. The ability to compute / efficiently is a strong one: most functions do not have 
this property. 

This paragraph shows how to implement efficiently the subprogram that changes the phase of 
an 5 1 -qubit standard basis state |x> by the angle fix) — fff. Let 


Pif) = Ti-f/2)K(f/2) = ( J ) 


be the transformation that shifts the phase in a qubit if that bit is 1 but does nothing if that bit is 
0. The program 

define Phase |a[s]} = 
for i e [0... s — 1] 

Pi 2 f)\a>) 

performs the s -qubit transformation Phase : \a) —* exp(i27r^)|a}. 
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The Phase program is used as a subroutine in the following program that implements the n -qubit 


transformation Phase f 

: \x) -» exp(27ri ^)|x): 


define Phasef\x[k]) — 



qubit fl[s] 

an ,v-hit temporary register 

(1) 

U f \x)\a) 

compute / in a 

(2) 

Phase | a) 

perform phase shift by 2jxa/2 s 

(3) 

uy l \x)\a) 

uncompute / 

(4) 


After step (2), register a is entangled with x and contains the binary expansion of the angle 
<l> (x) for the desired phase shift for the basis vector x). Since registers a and x are entangled, 
changing the phase in register a during step (3) is equivalent to changing the phase in register x. 
Step (4) uncomputes Uf to remove this entanglement so that the contents of register x end up in 
the desired state, no longer entangled with the temporary qubits. 

7.4.4 State-Dependent Single-Qubit Amplitude Shifts 

Aim Efficiently approximate, to accuracy s, rotating each term in a superposition by a 
single-qubit rotation R()3(x)) (see Section 5.4.1), where the angle /3(x) depends on the quan¬ 
tum state in another register. More specifically, we wish to implement a transformation that 
takes 

\x) <g> | b) -* \x) ® (R(P(x))\b )), 

where /3(x) ~ f(x) 2 7 f and the approximating function / : Z„ —> Z, is efficiently computable. 

From an efficient implementation of Uf, we can implement this transformation in O(s) steps 
plus two calls to Uf. The subroutine uses an auxiliary transformation Rot that shifts the amplitude 
in qubit b by theamount specified in register a ; | a) <8> |f>) —> |a) ® (R(a=^)\b)) where the contents 
of the s -qubit register a give the angle by which to rotate up to accuracy 2~ s . Figure 7.1 shows 
a circuit that implements Rot. Using our program notation, this transformation can be described 
more concisely by 

define Rot |fl[s])|&[l]) = 
for i e [0... s — 1] 

Wi) control Ri^-Mb)- 

The desired rotation specified by the function / can be achieved by the program 
define Rolf |*[fc])|&[l]) = 

qubit fl[s] an 5-bit temporary register 

Uf\x)\a) compute / in a 

Rot | a,b) perform rotation by 27r a/2 s 

Uf 1 1 x) | a) uncompute / 
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Figure 7.1 

Circuit for controlled rotation. 


7.5 A Few Simple Quantum Algorithms 

This section presents a few simple quantum algorithms. The first three problems are black box 
or oracle problems for which the quantum algorithm’s query complexity is better than the query 
complexity of any conceivable classical algorithm. The fourth is a problem for which the commu¬ 
nication complexity of a quantum protocol is better than the communication complexity for any 
possible classical one. Like Deutsch’s problem, the problems are a bit artificial, but they have rela¬ 
tively simple quantum algorithms that can be proved to be more efficient than any possible classical 
approach. Like Deutsch’s algorithm, these algorithms solve these problems with certainty. 

7.5.1 Deutsch-Jozsa Problem 

David Deutsch and Richard Jozsa present a quantum algorithm for the following problem, a 
multiple bit generalization of Deutsch’s problem of section 7.3.1. 

Deutsch-Jozsa Problem A function / is balanced if an equal number of input values to the 
function return 0 and 1. Given a function / : Z,™ i-»- Z 2 that is known to be either constant or 
balanced, and a quantum oracle Uf : |x}|y> —> |x) | y ® fix)) for /, determine whether the func¬ 
tion / is constant or balanced. 

The algorithm begins by using the phase change subroutine of section 7.4.2 to negate terms 
of the superposition corresponding to basis vectors \x) with f(x ) = 1: the subroutine returns the 
state 

,=0 
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(The subroutine uses a temporary qubit in state |—>. Just as section 7.4.1 showed for Deutsch’s 
algorithm, it is vital that the subroutine end with that qubit unentangled with any other qubits, 
so that it can be safely ignored.) Next, apply the Walsh transform W to the resulting state |i/ f } to 
obtain 

, N -1 / N~ 1 \ 

i^> = ^£ (- i > /(0 E(- 1 ) i ' 7 i^'> • 

i =0 \ 7=0 / 

For constant /, the (—1)^ (!) = (—1)^ (0) is simply a global phase and the state \(p) is simply |0): 

(-d /(0) ^ E (E *- 1 ^) lj) = ( “ 1)/<0) ^ E(- 1)i ' 0 i°> = (-d /( 0 ) io> 

j€z n 2 \i€Z\ / ieZ'J 

because, as box 7.1 shows, ^ 1 €Z k(— 1)‘' ; = 0 for j ^ 0. For / balanced 

je Z" \i&X0 i*X ,0 J 

where Xq — {x\f{x) = 0}. This time, for j— 0, the amplitude is zero: ^2j €X ( — l) 1 "^ — 
( — I)* ; = 0- Thus, measurement of state \<f>) in the standard basis will return |0) with 
probability 1 if / is constant and will return a non-zero \j) with probability 1 if / is balanced. 

This quantum algorithm solves the Deutsch-Jozsa problem with a single evaluation of Uf, while 
any classical algorithm must evaluate / at least 2"~ 1 + 1 times to solve the problem with certainty. 
Thus, there is an exponential separation between the query complexity of this quantum algorithm 
and the query complexity for any possible classical algorithm that solves that problem with 
certainty. There are, however, classical algorithms that solve this problem in fewer evaluations, 
but only with high probability of success. (See exercise 7.4.) 

7.5.2 Bernstein-Vazirani Problem 

The problem is to determine the value of an unknown bit string u of length n where one is allowed 
only queries of the form q ■ u for some query string q. The best classical algorithm uses 0(n ) calls 
to f u (q ) = q ■ u mod 2. A quantum algorithm, closely related to the algorithm we just gave for the 
Deutsch-Jozsa problem, can find u in just a single call to U f u : on a quantum computer it is possible 
to determine a exactly with a single query (in superposition). Let f u (q) — q ■ it mod2 and 

U fu : |^)|h> \q)\b® f u (q)). 

The following circuit (figure 7.2) solves this problem with certainty using only one call to Uf u . 
To understand how this circuit works, recall from section 7.4.2 that in the special case (f> — n, the 
phase change subroutine can be accomplished by the circuit 
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In this case, applying this circuit results in the state 

\fx) = ~^= ^(-1) /<(?) I?) = ~^= £(-l ) uq \q) 

in the first register. The next paragraph shows that applying the Walsh-Hadamard transformation 
W to this state produces the state |m>. 

Recall that W\x) = £,(-l)*' z |z>. Thus 

W|f x ) = w(-h^(-l ) '«| ?) ) 

A fact from box 7.1 tells us that (— l) u, i+ z i = (— i)(«®z)-« Furthermore, equation 7.1 tells us that 
the internal sum is 0 unless u © z = 0, which implies that the only term that remains is the u = z 
term. Thus, 



Figure 7.2 

Circuit for the Bemstein-Vazirani algorithm. 
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W( \fx) = 



I z) 


= \u). 


Thus measurement in the standard basis gives | u) with certainty. 


A Simpler Explanation Using quantum parallelism to compute on all possible inputs at the same 
time, and then cleverly manipulate the resulting superposition, is a common explanation for how 
quantum algorithms work. The description we gave for the Bernstein-Vazirani algorithm fits this 
framework. There is a question, however, as to whether quantum parallelism is the right way of 
looking at algorithms. To illustrate this point, we give an alternative description, due to Mermin, 
of exactly this same algorithm. 

The key to Mermin’s explanation of the algorithm is to look at the circuit in the Hadamard 
basis. To understand what the quantum black box for U /„ does in the Hadamard basis, recognize 
that it behaves as if it contained a circuit consisting of C no , operations from some of the qubits 
to the ancilla qubit: this circuit contains a C not from qubit i to the ancilla if and only if the ith bit 
of u is 1 (see figure 7.3). Recall from section 5.2.4 that Hadamard operations reverse the control 
and target roles of the qubits: 



X 


The Bernstein-Vazirani algorithms consists of starting with the state |0... 0)11) and applying 
Hadamard transformations to every qubit before and after the call to the black box for Uf u (see 
figure 7.2.) Thus, the Bernstein-Vazirani algorithm behaves as if it were a circuit consisting only 
of C not operations from the ancilla qubit to the qubits corresponding to 1-bits of u. (See figure 


-e 

^- 


V 






/ \ 

Q 

/ \/ 

7\7n7\ 

Figure 7.3 


For u =01101, the black box for Uf u behaves as if it contained this circuit, consisting of C not gates for each 1-bit of u. 
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Figure 7.4 

For u = 01101, the Bernstein-Vazirani algorithm behaves as if it were implemented by this simple circuit consisting of a 
C not for each 1-bit of u. 


7.4.) From this view of the circuit, it is immediate that the qubits end up in the state | u), so this 
much simpler explanation, which does not speak of quantum parallelism or of “computing on all 
possible inputs,” is the right way to look at the algorithm. 


7.5.3 Simon's Problem 

Simon’s problem: Given a 2-to-l function / such that f(x) — f(x ©a) for all x e Z", find 
hidden string a e Z". 


Simon describes a quantum algorithm that can find a in only O(n) calls to Uf, followed by 
0(rr) additional steps, whereas the best a classical algorithm that can do is ()(2' !/2 ) calls to /. 
Simon’s algorithm suggested to Shor an approach to the factoring problem that is now famous 
as Shor’s algorithm. As we will see in chapter 8, there are structural similarities between Shor’s 
algorithm and Simon’s algorithm. 

To determine a, create the superposition k) fix)). Measuring the right part of the register 
projects the state of the left register to -J=(|xo) + \xq ©a}), where f(x o) is the measured value. 
Applying the Walsh-Hadamard transformation W leads to 

W ^-^=(|x 0 > + | Xq © A>) 

v y 



y-a even 
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Measurement of this state results in a random y such that y • a — 0 mod 2, so the unknown bits a,- 
of a must satisfy the equation yo • «o © • • • © y n - 1 • a n -i = 0- This computation is repeated until 
n linearly independent equations have been found. Each time the computation is repeated, 
the resulting equation has at least a 50 percentage chance of being linearly independent of the 
previous equations obtained. After repeating the computation In times, there is a 50 percentage 
chance that n linearly independent equations have been found. These equations can be solved to 
find a in 0(n 2 ) steps. Thus, with high likelihood, the hidden string a will be found with 0(n ) 
calls to Uf, followed by 0(n 2 ) steps to solve the resulting set of equations. 

7.5.4 Distributed Computation 

This section describes a different type of quantum algorithm, one for which communication 
complexity is the concern. Like dense coding and teleportation, it uses entangled pairs that can 
be distributed ahead of time, independent of the computation, so these qubits are not counted 
as qubits transmitted during the solution of the problem (though the exponential savings would 
remain even if they were counted). 

The Problem Let A = 2". Alice and Bob are each given an IV-bit number, u and v respectively. 
The objective is for Alice to compute an «-bit number a and Bob to compute an n-bit number b 
such that 

dn{u, v) — 0 -» a — b 
d H (u, v ) = A/2 -* a ^ b 

else -» no condition on a and b 

where dn(u, v) is the Hamming distance between u and v. In other words, Alice and Bob need 
an algorithm that produces a and b from any u and v such that if u = v, then a = b; if u and v 
differ in half of their bits, then a b; and if the Hamming distance of u and v is anything other 
than 0 or A/2, a and b can be anything. 

This problem is nontrivial because u and v are exponentially larger than a and b. Given a 
sufficient supply of entangled pairs, this problem can be solved without additional communication 
between Alice and Bob, while a classical solution requires communication of at least A/2 bits 
between the two parties. 

Suppose Alice and Bob share n entangled pairs of particles (a,-, bfi, each in state -1=(|00) + 

111)), where Alice can access particles a,- and Bob can access particles h : . We write the state of 
the 2 n particles making up these n entangled pairs in order ao, a\, ..., a„-\, bo, b\, ..., b n - 1 , so 
the entire 2n -qubit state is written -= | i,i), where Alice can manipulate the first n qubits 

and Bob can manipulate the last n qubits. 

The problem can be solved without additional communication as follows. Using the phase 
change subroutine of section 7.4.2, with /(/') = u t , Alice performs ^ |;) —>• ^(— l)“‘|z} fol¬ 
lowed by the Walsh transform W on her n qubits. Bob performs the same computation on his n 
qubits using f(i) — i Together their particles are now in the common global state 
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/ i N ~' 

\*) = Wl — 

Alice and Bob now measure their respective part of the state to obtain results a and b. We need 
to show that a and b have the desired properties. 

The probability that measurement results in a — x = b is |(x, x\f)\~. We wish to show that 
this probability is 1 if u = v and 0 if dn(u, v) = N/2. Let us simplify the state as follows, where 
the superscript in W (,) indicates that W is acting on an / qubit state: 

N -1 

I = w (2n) — ^(—l)"'®"' |f)|f) 
i=0 


l 

Vn 


N— 1 

E(-1)“'®"' (W (n) |t> <g> W in) |r» 

i =0 


t N-l N—l N— 1 

— j= E E E(- 1 )"' ffiw, (- 1 ) , ''- , '(- 1 ) i '*i^>- 

^ i= 0 7=0 &=0 


Now 


(jc, j:|V r > 


1 N-l N-l 

— 1 IV* — 1 1 \Ui@Vi 


1=0 


aVa 2 


i=0 


If m = u, then (—1)"'®”' = land (x,x|t/r} = -^=, so the probability | (x, x|i/ r } | 2 = The prob¬ 
ability, summed over the A possible values of.r, is 1, so when Alice and Bob measure they obtain, 
with probability 1, states a and b with a = b — x for some bit string x. For d H (u, v) — A/2, the 
sum (x, x\\jr) — (—l ) 1 ''® 1 '' has an equal number of + 1 and —1 terms, which cancel to 

give (x, x\ifr) = 0. Thus, in this case, Alice and Bob measure the same value with probability 0. 


7.6 Comments on Quantum Parallelism 


Because quantum parallelism’s role in quantum computation has often been misunderstood, we 
make a few comments to address some common misconceptions. The notation 

1 ^ 

x—0 

suggests that exponentially more computation is being done by the quantum operation Uf acting 
on the superposition \x, 0) than by a classical computer computing fix) from x. The next 
paragraph explains how this view is misleading and how it does not explain the power of quantum 
computation. Similarly, the exponential size of the n -qubit quantum state space may seem to 
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suggest that an exponential speedup over the classical case can always be obtained using quantum 
parallelism. This statement is generally incorrect, although in certain special cases quantum 
computation does provide such speedups. We elaborate briefly on each of these statements. 

As explained in section 7.1.2, only one input/output pair can be extracted by measurement 
in the standard basis from the superposition generated quantum parallelism. It is not possible to 
extract more input /output pairs in any other way since, as section 4.3.1 explained, only m bits of 
information can be extracted from an m -qubit state. Thus, while the 2" values of f(x) appear in 
the single superposition state, it still takes 2" computations of Uf to obtain them all, no better than 
the classical case. This limitation leaves open the possibility that any classical algorithm that takes 
2" steps to obtain n bits of output could be done in a single step on a quantum computer. While 
some algorithms do give speedups of this magnitude over classical algorithms, the optimality of 
Grover’s algorithm proved in chapter 9.1 shows that there are problems of this form for which 
it is known that no quantum algorithm can provide an exponential speedup. Furthermore, lower 
bound results exist that show that for many problems quantum computation cannot provide any 
speedup at all. Thus, quantum parallelism and quantum computation do not, in general, provide 
the exponential speedup suggested by the notation. 

Furthermore, a superposition like ' ^ \x, f (x)) is still only a single state of the quantum 
state space. The «-qubit quantum state space is extremely large, so large that the vast majority of 
states cannot even be approximated by an efficient quantum algorithm. (The elegant proof goes 
beyond the scope of this book. A reference is given in section 7.9.) Thus, an efficient quantum 
algorithm cannot even come close to most states in the state space. For this reason, quantum 
parallelism does not, and efficient quantum algorithms cannot, make use of the full state space. 

As Mermin’s explanation of the Bernstein-Vazirani algorithm of section 7.5.2 illustrates, even 
when quantum parallelism can be used to describe an algorithm, it is not necessarily correct to 
view it as key to the algorithm. Understanding where the power of quantum computation comes 
from remains an open research question. The status of entanglement as one of the keys will be 
discussed in the introduction to chapter 10 and in section 13.9, which addresses this question 
explicitly. 

When algorithms are described in terms of quantum parallelism, the heart of the algorithm is 
the way in which the algorithm manipulates the state generated by quantum parallelism. This sort 
of manipulation has no classical analog and requires nontraditional programming techniques. We 
list a couple of general techniques: 

• Amplify output values of interest. The general idea is to transform the state in such a way 
that values of interest have a larger amplitude and therefore have a higher probability of being 
measured. Grover’s algorithm of chapter 9 exploits this approach, as do the many closely related 
algorithms. 

• Find properties of the set of all the values of f(x). This idea is exploited in Shor’s algo¬ 
rithm of chapter 8, which uses a quantum Fourier transformation to obtain the period of /. The 
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algorithms given in section 7.5 for the Deutsch-Jozsa problem, the Bernstein-Vazirani problem, 
and Simon’s problem all take this approach. 

7.7 Machine Models and Complexity Classes 

Computational complexity classes are defined in terms of a language and machines that recognize 
that language. In this section, the term machine refers to any quantum or classical computing 
device that runs a single algorithm on which we can count the number of computation steps and 
storage cells used. A language L over an alphabet E is a subset of the finite strings E * of elements 
from E. A language L is recognized by a machine M if, for each string x e E*, the machine M 
can determine if x e L. Exactly what determine means depends on the kind of machine we are 
considering. For example, given input x, a classical deterministic machine may answer' Yes, x e L, 
or No, x L, or it may never halt. Probabilistic and quantum machines might answer Yes or No 
correctly with certain probabilities. We consider five kinds of classical machines, deterministic 
(D), nondeterministic (N), randomized (R), probabilistic (Pr), and bounded probability of error 
(BP). Each of these types of classical machine has a quantum analog (EQ, NQ, RQ, PrQ, BQ). Of 
particular interest will be quantum deterministic (exact) machines (EQ), and quantum bounded 
probability of error machines (BQ). Section 7.7.1 uses these types of machine to define numerous 
complexity classes of varying resource constraints. We now more rigorously describe exactly 
how the different kinds of machines recognize a language. 

For each kind of machine M, there is a single language Lm that M recognizes. For example, 
a machine is deterministic if whenever it sometimes answers Yes on a given input x it always 
answers Yes on that input. A deterministic machine D recognizes the language 

Ld = {x € E*|Z)(jv) = Yes} = [x\P(D(x) — Yes) — 1}. 

By definition of deterministic, for all x f L, the probability P( D(x) = Yes) is zero. As a sec¬ 
ond example, a bounded probability of error machine, acting on a given input x, either answers 
Yes with probability at least 1/2 + e or with probability no more than 1/2 — e. Given a 
bounded probability of error machine BP, Lbp — {x\ P(BP(x) = Yes) > 1/2 + e}. For x f Lpp, 
P(BP(x) = Yes) < 1/2-e. 

Amachine may not give an answer at all for some inputs. Table 7.1 summarizes the conditions for 
the various types of machines we consider. The quantum machine types recognize a language with 
the same probability as their classical counterparts. Figure 7.5 illustrates containment relations 
between the kinds of machines. Containment means that by definition each D machine, for 
example, is also an R machine. 

A language is recognized by a kind of machine if there exists a machine of that kind that 
recognizes it. The set of languages recognized by the types of machines we have defined does 
not depend on the particular value of e. For example, suppose we are given a Pr machine M that 
answers Yes for x e L with probability P(x e L) > / + e. We can construct a new Pr machine M' 
that runs M three times and answers Yes if M answers Yes at least two times. Then M' will accept 
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Table 7.1 

Probability for a particular kind of machine to answer Yes when given an input x that is or is not and element of language L 


Prefix 

Kind of machine 

P(x e L) 

P(X <jL L) 

Classical 




D 

Deterministic 

= 1 

= 0 

N 

Nondeterministic 

> 0 

= 0 

R 

Randomized (Monte Carlo) 

+ 

A 

= 0 

Pr 

Probabilistic 

> \ 


BP 

Bounded probability of error 

kU 

+ 

A 

< j-e 

Quantum 




EQ 

Quantum deterministic (exact) 

= i 

= 0 

BQ 

Quantum bounded probability of error 

+ 

A 

— 2 e 


Pr 



D 


Figure 7.5 

Containment relations between kinds of machines. These relations hold for classical and quantum machines, and for time 
and space complexity. 

x e L with probability > 5 + f e — e 3 . Some authors use a fixed value such as e — 1 /A. The case 
P(x e L) > I /2 is quite different from P(x e L) > 1/2 + 6, however, since in the former case 
no polynomial number of repetitions can guarantee an increase in the success probability above 
a given threshold 5 + e. 

7.7.1 Complexity Classes 

In addition to being concerned about the probability that a machine answer correctly, complexity 
theory is concerned about quantifying the amount of resources, particularly time and space, that 
a machine uses to obtain its answers. A machine recognizes a language L in time 0(f) if, for 
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any string x e E* of length n, it answers Yes or No within t(n) steps and t e 0(f). A machine 
recognizes a language L in space 0(f) if, for any string x e E* of length n, it answers Yes or No 
using at most s(n) storage units, measured in bits or qubits, where .v e 0(f). 

A complexity class is the set of languages recognized by a particular kind of machine within 
given resource bounds. Specifically, for m e {D, EQ, N, R, Pr, BP}, we consider the classes 
mTime(/) and mSpace(/). Language L is in complexity class mTimef/) if there exists a 
machine M of kind m that recognizes L in time 0( f). Language L is in complexity class 
m Space!/) if there exists a machine M of kind m that recognizes L in space 0(f). 

We are particularly interested in machines that use only a polynomial amount of resources, and 
to a lesser extent in those that use only an exponential amount. For example, we are interested 
in the class P = DTimefn*) of machines that respond to an input of length n using only O (n k ) 
time for some k. The following shorthand notations are common: 


p 

DTim e(n k ) 

EQP 

EQTimeCn*) 

NP 

NTime(n*) 

R 

RTime(«*) 

PP 

PrTime(n*) 

BPP 

BPTimefn*) 

BQP 

BQTimefn^) 

PSpace 

DSpace(«*) 

NPSpace 

NSpace(n*) 

EXP 

DTime(L') 


For time classes, we can assume that machines always halt because the function / provides 
an upper bound on the possible runtimes. However, machines in the space complexity classes 
may never halt on some inputs. Therefore, we define m//Space(/) to be the class of languages 
that are recognized by a halting machine of type m in space 0(f). Obviously, ni//Space! /') C 
mSpacef/). Note that in the circuit model all computations will terminate. Analysis of the 
complexity of nonhalting space classes requires a different model of computation, such as quantum 
Turing machines. 

7.7.2 Complexity: Known Results 

We give informal arguments for some of the containment relations involving quantum complexity 
classes. Figure 7.6 depicts the known containment relation involving classical and quantum time 
complexity classes. Nothing is as yet known about the relation between BQP and NP or PP. 
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PSpace 



PP BQP 



NP BPP EQP 



Figure 7.6 

Containment relation involving classical and quantum complexity classes. 


P C EQP Any classical polynomial time computation can be performed by a polynomial size 
circuit family. This inclusion follows from the main result of chapter 6: all classical circuits can 
be done reversibly with only a slight increase in time and space, and any reversible polynomial 
time algorithm can be turned into a polynomial time exact quantum algorithm. 

EQP c BQP This containment is trivial since every exact quantum algorithm has bounded 
probability of error. 

BPP C BQP Any computation performed by a machine M in BPP can be approximated arbi¬ 
trarily closely by an machine M that makes a single equiprobable binary decision at each step. 
Furthermore, this decision tree is of polynomial depth, so a sequence of choices can be encoded 
by a polynomial size bit string c. From M one can construct a deterministic machine M ( i that, 
when applied to c and .r, will perform the same computation as M applied to x making the random 
choices c. For the deterministic machine M c i there is a polynomial time quantum machine M q that 
can be applied to the superposition of all possible random choices c applied to x, |c, x, 0), 
producing |c, x, Mj (c. x)j. In effect, M q performs all possible computations of M on x in 
parallel. The probability of reading an accepting answer from M q is the same as the probability 
that M would accept x. 

It is not known whether BPP C BQP is a proper inclusion. In fact, showing BPP ^ BQP 
would answer the open question as to whether BPP = PSpace. 

BQP C PSpace Consider a machine in BQP acting on an input of size n that starts from a 
known state |i/^o) = |0) and proceeds for k steps followed by a measurement. We show that such 
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a machine can be approximated arbitrarily closely, in the sense of computing any amplitude of 
the final state to a specified precision, in polynomial space. Let |t/r,-) = ; a ij\j) denote the state 

after step i. Each state | t/r,-), i ^ 0 may be a superposition of an exponential (in n ) number of 
basis vectors. Yet, using only space polynomial in n, it is possible to compute the amplitude ajy 
of an arbitrary basis vector in the final superposition 1 V'*}. 

We may assume each step corresponds to a primitive quantum gate U, that operates on at most 
d < 3 quantum bits. For these transformations, we show that the amplitude a ;+i/ of basis vector 
| j) in state IV'V+i} depends only on the amplitudes a,j of the small number ( 2 d < 8) of basis 
vectors of the preceding state \j /,) that differ from \j) only in the bits that are being operated on 
by the gate. Without loss of generality, assume that U = f/,+i operates on the last d quantum bits. 
We will use the the shorthand x o y to stand for 2 d x + y and let u qr = (r\U\q) for basis elements 
|r) and | q) in the standard basis for a 2 d -dimensional space. 

W i+1 ) = (i"- d ®UMi) 


= J2auV n ~ d ®U)\j) 


2 n~d_i 2 d — \ 


= E ^2 a i,poq\P)® U k) 


p =0 q =0 


2—1 


= EE @i,poq I p)® E Uq r \r) 


r =0 


r2*-i 


= EE E-. 

P >' \ ?+0 


I p)\r) 


_ y 2 a _ \ 7 

It follows that each amplitude a ,+\ iPor — zZq =o u qrai,p° q depends only on 2 amplitudes ciy poq 
of the preceding state. 

By induction, we argue that it requires storage of i2 d amplitudes to compute a single amplitude 
of state | xj/i). Since we know |i/fo}, it takes no space to compute the amplitude {j | V^o) for any j. As 
we have just seen, the amplitude a ,-+ij can be computed from 2 d amplitudes of <//,}. We can do 
this by computing each of these amplitudes in turn, which requires storing at most i2 d amplitude 
values, storing the resulting 2 d amplitudes, and computing cij+ij. Overall, this process requires 
storage of (i + l)2 d amplitude values. 

We take M to be the maximum precision required at any point in the computation to obtain the 
desired precision at the end. The total accumulated error is no larger than the sum of the errors of 
individual steps. Thus, the number M grows only linearly in the number of steps needed, and any 
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one amplitude value can be stored in space M and the amplitude of any basis vector of the final 
superposition after k steps can be computed in k2 d M space. Since by assumption k is polynomial 
in n, d is a constant no more than 3, and M only grows linearly with k, it takes only polynomial 
space to compute a single amplitude of the final state IVhr}- 

To simulate the algorithm, choose a basis vector | j) randomly (or, if you prefer, in a specified 
order) and calculate the amplitude a k j . Generate a random number between 0 and 1 and see if it 
is less than |ajy|. If so, return \ j). Otherwise free all the space, choose another basis vector, and 
repeat. Repeat as often as necessary until a basis vector is returned (time is not an issue!). Thus, 
any computation in BQP can be simulated classically in polynomial space. 

7.8 Quantum Fourier Transformations 

The quantum Fourier transformation (QFT) is the single most important quantum subroutine. 
It and its generalizations are used in many quantum algorithms that achieve a speedup over 
classical algorithms. Appendix B.2.2 discusses generalizations of quantum Fourier transforms 
and shows that the Walsh-Hadamard transformation is a generalized quantum Fourier transform. 
The quantum Fourier transformation (QFT) is based on the classical discrete Fourier transforma¬ 
tion (DFT) and its efficient implementation, the fast Fourier transform (FFT). We briefly describe 
the classical discrete Fourier transform (DFT) and the fast Fourier transform (FFT) before describ¬ 
ing the quantum Fourier transform (QFT) and its surprisingly efficient quantum implementation. 

7.8.1 The Classical Fourier Transform 

Discrete Fourier Transform The discrete Fourier transform (DFT) operates on a discrete 
complex-valued function to produce another discrete complex-valued function. Given a func¬ 
tion a : [0, ..., N — 1] —> C, the discrete Fourier transform produces a function A : |0. N — 

1] -» C defined by 

1 

A(x) = —= > a(k) exp 
k=0 

The discrete Fourier transform can be viewed as a linear transformation taking column vec¬ 
tor (a(0), ..., a(N — l)) r to (A(0), ..., A(N — I)) 7 with matrix representation F with entries 
F xk — —L exp(27ri^-). The values A(0), ..., A(N — 1) are called the Fourier coefficients of the 
function a. 



Example 7.8.1 Let a : [0, ..., N — 1] — »• C be the periodic function a(x) — exp(—27ri^) for 
some frequency u evenly dividing N. We assume that the function is not constant: 0 < u < N. 
The Fourier coefficients for this function are 
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A(x) = 


N-l 


1 % > / A. A 

—= > a(k)exp 2 tti — 

Va ^ ' N 


k =0 
N-l 


.kx 


A 


k=0 


. uk 


— J2 e w(-2xi-)exp 



N-l 


= —= Y exp I 2tt\ 

Vn ' 


k=0 


k(x — u) 
~N 


It is a well-known fact that sums of the form exp(27rik-^) vanish unless r — 0 mod A. 

(We prove a more general fact in appendix B.) Since u < A, A(x) = 0 unless x — u = 0: only 
A(u) will be non-zero. 


Any periodic complex-valued function a with period r and frequency u = N/r can be approxi¬ 
mated, using its Fourier series, as the sum of exponential functions whose frequencies are multiples 
of u. Since the Fourier transform is linear, the Fourier coefficients A(x) of any periodic function 
will be the sum of the Fourier coefficients of the component functions. If r divides N evenly, the 
Fourier coefficients A(x) will be non-zero only for those x that are multiples of u — N/r. If r 
does not divide N evenly, the result only approximates this behavior, with the highest values at 
the integers closest to multiples of u = N/r and low values at integers far from these multiples. 


Fast Fourier Transform The fast Fourier transform (FFT) is an efficient implementation of the 
discrete Fourier transform (DFT) when A is a power of two: N — 2". The key to the implementa¬ 
tion is that F m can be recursively decomposed in terms of Fourier transforms for lower powers 
of 2 . 

Let a>(n) be the Ath root of unity, &>(„) = exp(^). The entries of the A x A matrix F in) for 
the A = 2” dimensional Fourier transform are simply 


where we index the entries of all A x A matrices by i e {0.A — 1} and j e {0,..., A — 1}. 

Let F {k> be the 2 k x 2 k matrix for the 2 k -dimensional Fourier transform. 

Let I (k) be the 2 k x 2 k identity matrix. Let D (k> be the 2 k x 2 k diagonal matrix with elements 
<w ( ) / t+i)' • • ■ 0 -\k+i)■ L et R <k> be the permutation shown in figure 7.7 that maps the vector entries at 
index 2 i to position i and at index 2 i + 1 to position i + 2 k 1 . The entries of the 2 k x 2 k matrix 
for R (k> are given by 


R 


(k) 


1 if 2 i=j 
1 if 2(7 - 2 k ) +1 = 7 
0 otherwise. 
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Figure 7.7 

An example of the shuffle transform R. 


The reader may verify (see exercise 7.7) that 


= _ 


/(*-!). D (k ~" 
/(*-!) — /)(*-!) 


/rd’-h 

0 


The reader may consult any standard reference on the fast Fourier transform for an imple¬ 
mentation based on this recursive decomposition that uses only O(nN) steps. 


7.8.2 The Quantum Fourier Transform 

The quantum Fourier transform (QFT) is a variant of the discrete Fourier transform, which, like 
the fast Fourier transform (FFT), assumes that N = 2". The amplitudes a x of any quantum state 
^2, x a x \x) can be viewed as a function of x, which we will denote by a(x). The quantum Fourier 
transform operates on a quantum state by sending 

^^(7 (a')|a) —> Y, A{x)\x) 

X X 

where the A{x) are the Fourier coefficients of the the discrete Fourier transform of a (a), and x 
ranges over the integers between 0 and N — 1. If the state were measured in the standard basis right 
after the Fourier transform was performed, the probability that the resulting state would be \x) 









156 


7 Introduction to Quantum Algorithms 


would be | A(jc) | 2 . The quantum Fourier transform generalizes from a classical complex-valued 
function in quite a different way from how Uf generalizes a binary classical function /; here 
the output of the classical function is placed in the complex amplitudes of the final superposition 
state, and there is no need for an additional output register. 

Applying the quantum Fourier transform to a state whose amplitudes are given by a periodic 
function a(a') = a x with period r, where r is a power of 2, would result in A{x)\x), where 
A(x) is zero except when x is a multiple of y . Thus, were the state measured in the standard basis 
at this point, the result would be one of the basis vectors I*) with label a multiple of y , say \jy). 
The quantum Fourier transform behaves in only approximately this way when the period is not a 
power of 2 (does not divide N — 2"): states labeled with integers near multiples of y would be 
measured with high probability. The larger the power of 2 used as a base for the transform, the 
closer the approximation. 

While the implementation of the quantum Fourier transform is based on that of the fast Fourier 
transform, the quantum Fourier transform can be implemented exponentially faster, needing only 
0(n 2 ) operations, not the O(nN) operations needed for the fast Fourier transform. We will see 
in appendix B.2.2 that the quantum Fourier transform is a special case of a more general class of 
efficiently implementable quantum transformations. 

7.8.3 A Quantum Circuit for Fast Fourier Transform 

We show how to implement efficiently the quantum Fourier transform U F ( " 1 for N — 2", defined 
by 


/ , 1 \—\ 2jtikx 

Uf ' lk) 7?^2^ exp( ^^ )|x) - 

^ x =0 


N 


The quantum Fourier transform for N = 2 is the familiar Hadamard transformation: 


|0>- 

> V2 

ll>- 

1 

> 


x=0 

1 


V 2 


x=0 


e xn \x) = -= (| 0 >-| 1 ». 
V2 


Using the recursive decomposition of section 7.8.1, 


U F (k+l) = 


1 

7 ! 


/w dw 

j(k) 


u F n o 

0 u f m 




we can compute U F {n) . All of the component matrices are unitary (the multiplicative factor in 
front goes with the first matrix). It remains to be shown how these components can be efficiently 
realized on a quantum computer. 
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We proceed as follows: 

1. We can write the rotation R <k+i) as 

2 k -l 

R (k+D = \i)(2i | + |i + 2 k ) (2 i + 11 ■ 

i=0 


It can be accomplished by a simple permutation of the k + 1 qubits: qubit 0 becomes qubit k and 
qubits 1 through k become qubits 0 through k — 1. This permutation can be implemented using 
k — 1 swap operations of section 5.2.4. 

2. The transformation 


/ U F (k) 0 
\ 0 U F (k) 


I 0 U F (k) 


can be implemented by recursively applying the quantum Fourier transform to qubits 0 through k. 

3. For k > 1, the 2 k x 2 k -diagonal matrix of phase shifts can be recursively decomposed 
as 

d w=d«-v®( 1 ° y 

V 0 <°(k+l) / 


Recursively decomposing D® in this way, the transformation D ,k> can be implemented by apply¬ 
ing \ \ ^ ) to qubit i for 1 < i < k. Thus altogether D (k ~ l) can be implemented using k 

\ 0 coa+i) ) 

single-qubit gates. 

4. Given this implementation of D ik> , then 

1 / 7 (<r > D (k> \ 

71 { l ik) -D® ) 

can be implemented with only k gates. 

1 / I^ D® \ 1 1 

_ D m J = ^(|o> + |i»(0|®/^ + —(|0>-|i})(i|®^> 

= (77|0)(0|) 0 I (k) + (77| 1> (1|) 0 D (k) 


= (H® !<*>)(|0>(0| 0 Z (k) + 11> <11 0 D (k> ). 

The transformation (|0> (0| 0 I (k) + \ 1} (11 0 D (k) ) applies D (k) to the low-order bits controlled by 
the high-order bit: it applies D (k) to bits 0 through k — 1 if bit k is one. This controlled version 
of D ik) can be implemented as a sequence of k two-qubit controlled gates that apply each of the 
single-qubit operations making up D {k) to bit i controlled by bit k. 
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Figure 7.8 

A recursive quantum circuit for Fourier transform. 


Since D (k) and R (k) can both be implemented with 0(k) operations, the kth step in the recursion 
adds 0(k ) steps to the implementation of Uf (,!) . Overall, Uf ( h) takes 0(n 2 ) gates to implement, 
which is exponentially faster than the 0(ti2 n ) steps required for classical fast Fourier transform. 
A circuit for this implementation of the quantum Fourier transform is shown in figure 7.8. 

A recursive program for this implementation would be 

define QFT |x[l]} = H\x) 

QFT\x[n]) = 

Swap |xo)|xi • ■ • x„_i> 

QFT |x 0 • • • x„_ 2 ) 

|x„_i} control D (n_1) |xo • • •x„_ 2 } 

H\x n -i). 

7.9 References 

Classical circuit complexity is discussed in Goldreich [131] and Vollmer [279]. Watrous [281] 
provides an excellent and extensive survey of quantum complexity theory. An older survey by 
Cleve [85], unlike Watrous, discusses quantum communication complexity as well as quantum 
computational complexity. Brassard [59] and de Wolf [97] both survey quantum communication 
complexity. 

Deutsch described the solution to the 1-qubit version of his problem in [99]. The three subrou¬ 
tines were discussed in Hogg, Mochon, Polak, and Rieffel [154]. Deutsch and Jozsa presented the 
n-qubit version and its solution in [102]. Simon’s problem with solution appeared in [256]. The 
Bernstein-Vazirani problem first appears in Bernstein and Vazirani [49] as part of a more complex 
algorithm. The simpler explanation of the algorithm appears in Mermin [209]. Both Grover [144] 
and Terhal and Smolin [269] independently rediscovered the problem and quantum algorithms for 
its solution. The latter reference contains a proof of the complexity of the best possible classical 
algorithm. 

The example of section 7.5.4 was presented by Brassard, Cleve, and Tapp [60] in the context of 
their study of quantum communication complexity. Various notions of communication complexity 
are discussed in [155, 96, 74]. 
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Beals et al. [35] proved that for a broad class of problems quantum computation cannot provide 
any speedup. Their methods were used by others to provide lower bounds for other types of 
problems. Ambainis [21] found another powerful method for establishing lower bounds. 

Bernstein and Vazirani [49] analyze the accumulation of errors, and their result implies that 
the needed precision for simulating a quantum computation grows only linear with the number 
steps. Bennett et al. [41] provide another, more accessible, account. Yao [287] shows that any 
function computable in polynomial time on a quantum Turing machine is computable by a poly¬ 
nomial quantum circuit. The same is true for classical Turing machines and Boolean circuits, 
and proofs can be found in the paper by Pippenger and Fischer [229] or Papadimitriou’s book 
[223]. Papadimitriou’s book also contains a comprehensive definition of classical complexity 
classes, as does Johnson [164]. Boppana and M. Sipser [55] discuss classical complexity for 
Boolean circuits. Formal proofs of the complexity results given here can be found in the papers 
of Berthiaume-Brassard [50] and Bernstein-Vazirani [49]. 

The idea of Fourier transformation goes back to Joseph Fourier’s 1822 book The Analytical 
Theory of Heat [123]. The algorithm for fast Fourier transformation was proposed by Cooley and 
Tukey [88]; more comprehensive treatments can be found in Brigham [66], Cormen et al. [90], 
Knuth [182], and Strang [264] . 

The quantum Fourier transform was developed independently by Shor [250] and Coppersmith 
[89], and by Deutsch in an unpublished paper. Ekert and Jozsa [112] provide an attractive pre¬ 
sentation of quantum Fourier transforms, including some of the circuit diagrams we give here. 
Approximate implementations of the quantum Fourier transform are analyzed in Barenco et al. 
[32]. For instance, it is shown that for some applications approximate computations may lead to 
better performance. 

Aharonov, Landau, and Makowsky [12], Yoran and Short [288], and Browne [67] show that 
quantum Fourier transforms can be simulated efficiently on a classical computer in the sense that 
there exist efficient classical algorithms that provide a means of sampling from a distribution 
identical to that obtained by measuring the output of the quantum Fourier transform when the 
input is a product state. Browne exhibits a method for efficient classical simulation of the quantum 
Fourier transform applied to a broader class of input states. It is not known how to simulate 
efficiently the output distribution of the quantum Fourier transform for certain other input states. 
One such state is the output of the modular exponentiation circuit of section 6.4.6 when applied to 
a superposition of all inputs. Were it possible classically and efficiently to simulate sampling from 
such a distribution, Shor’s algorithm, described in chapter 8, would be classically simulatable, 
yielding an efficient classical solution to the factoring problem. For this reason, it is suspected 
that such simulation is impossible. 

7.10 Exercises 

Exercise 7.1 . In the standard circuit model of section 5.6, the computation takes place by applying 
quantum gates. Only at the end are measurements performed. Imagine a computation that proceeds 
instead as follows. Gates Go, G\.... ,G n are applied, then qubit i is measured in the standard 
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basis and never used again. If the result of the measurement is 0, the gates Goi, G 02, ..., Gok 
are applied. If the result is 1, then gates Gu, G12, ■. ■, G\i are applied. Find a single quantum 
circuit in the standard circuit model, with only measurement at the very end, that carries out this 
computation. 


Exercise 7.2. Prove equation 7.1: 

2 " if y = 0 
0 otherwise. 



Exercise 7.3. Let / and g be functions from the space of /r-bit strings to the space of m-bit 
strings. Design a quantum subroutine that changes the sign of exactly the basis states \x) such 
that f(x) — g(x), and which is efficient if / and g have efficient implementations. 

Exercise 7.4. 

a. Prove that any classical algorithm requires at least two calls to C/ to solve Deutsch’s problem. 

b. Prove that any classical algorithm requires 2" 1 + 1 calls to Cf to solve the Deutsch-Jozsa 
problem with certainty. 

c. Describe a classical approach to the Deutsch-Jozsa problem that solves it with high probability 
using fewer than 2 " _1 + 1 calls. Calculate the success probability of your approach as a function 
of the number of calls. 

Exercise 7.5. Show that a classical solution to Simon’s problem requires G(2" /2 ) calls to the 
black box, and describe such a classical algorithm. 

Exercise 7.6. Show directly that, in the distributed computation algorithm of section 7.5.4, when 
u — v, |{x, y\ir) |" = 0 for all x ^ y. 

Exercise 7.7. Fast Fourier transform decomposition. 

a. For k < /, write the entries c/f 1 of the 2 k x 2 k matrix for the Fourier transform U F ik> in terms 
of CO(J). 

b. Find m in terms of k such that —to\ k) = for all i e Z. 

c. Compute the product 

/ /(*-!) £,(*- 1 ) \ / U F (k ~ 1) 0 \ 

V /(*-*> —D (k ~ l) ) V 0 U F (k ~ l) ) ’ 

ultimately writing each entry as a power of aj(k ). 

d. Let A be any 2 k x 2 ; matrix with columns Aj. The product matrix A R (k) is just a permutation 
of the columns. Where does column Aj end up in the product A R >k, l 
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e. Verify that 


U F (k) = 


1 

vi 


/(*—!) — 1 ) 

/(*-!) _£)(*-!) 


o 


R (k) . 


Exercise 7.8. Even though we know little about quantum hardware, it makes sense that we may not 
want to require multiple qubit transformations that involve physically distant qubits, since these 
may be difficult to implement. To avoid such transformations, we can modify the implementation 
we gave very slightly. 

a. Give a quantum circuit like that of figure 7.8 for the Fourier transform that does not swap 
qubits but changes the order of the output qubits instead. 

b. Give a complete quantum circuit for the Fourier transform U F (i> that contains only single¬ 
qubit transformations and two-qubit transformations on adjacent qubits. You may want to use the 
two-qubit swap operator defined in section 5.2.4. 
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Shor's Algorithm 


In 1994, inspired by Simon’s algorithm, Peter Shor found a bounded probability polynomial¬ 
time quantum algorithm for factoring integers. Since the 1970s, researchers have searched for 
efficient algorithms for factoring integers. The most efficient classical algorithm known today, 
the number field sieve, is superpolynomial in the size of the input. The input to the algorithm 
is M, the number to be factored. The input M is given as a list M’s digits, so the size of the 
input is taken to be m — [logM], The number field sieve requires 0(exp(m^ 3 )) steps. People 
were confident enough that factoring could not be done efficiently that the security of many 
cryptographic systems, such as the widely used RSA algorithm, depends on the computational 
difficulty of this problem. Shor’s result surprised the community at large, prompting widespread 
interest in quantum computing. 

Shor’s factoring algorithm provides a fast means for finding the period of a function. A standard 
classical reduction of the factoring problem to the problem of finding the period of a certain func¬ 
tion has long been known. Shor’s algorithm uses quantum parallelism to produce a superposition 
of all the values of this function in one step; it then uses the quantum Fourier transform to create 
efficiently a state in which most of the amplitude is in states close to multiples of the reciprocal of 
the period. With high probability, measuring the state yields information from which, by classical 
means, the period can be extracted. The period is then used to factor M. 

Section 7.8.2 covered the crux of the quantum part of Shor’s algorithm: the quantum Fourier 
transform. The remaining complications are classical, particularly the extraction of the period 
from the measured value. 

Section 8.1 explains the classical reduction of factoring to the problem of finding the period 
of a function. Section 8.2 explains the details of Shor’s algorithm, and section 8.3 walks through 
Shor’s algorithm in a specific case. Section 8.4 analyzes the efficiency of Shor’s algorithm. Section 
8.5 describes a variant of Shor’s algorithm in which a measurement performed in the course of 
the algorithm is omitted. Section 8.6 defines two problems that are solved by generalizations 
of Shor’s factoring algorithm: the discrete logarithm problem and the Abelian hidden subgroup 
problem. Appendix B describes the generalizations of Shor’s algorithm that solve these problems 
and discusses the difficulty of the general hidden subgroup problem. 
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8.1 Classical Reduction to Period-Finding 

The order of an integer a modulo M is the smallest integer r > 0 such that a'' = 1 mod M; if 
no such integer exists, the order is said to be infinite. Two integers are relatively prime if they 
share no prime factors. As long as a and M are relatively prime, the order of a is finite. Consider 
the function f(k ) = a k mod M. Because a k — a k+r mod M if and only if a r — 1 mod M , for a 
relatively prime to M, the order r of a modulo M is the period of /. If a' = I mod M and r is 
even, we can write 

(a r/2 + 1 )(a r/2 - 1) = OmodM. 

As long as neither a r ^ + 1 nor a r ^ 2 3 4 5 6 — 1 is a multiple of M, both a’^ 2 + 1 and a r ^~ — 1 have 
nontrivial common factors with M. Thus, if r is even, a'^ 1 + 1 and a r/1 — 1 are likely to have a 
nontrivial common factor with M. This property suggests a strategy for factoring M: 

• Randomly choose an integer a and determine the period r of f(k) = a k mod M. 

• If r is even, use the Euclidean algorithm to compute efficiently the greatest common divisor of 
a r/2 + 1 and M. 

• Repeat if necessary. 

In this way, factoring M has been converted to a different hard problem, that of computing 
the period of the function f(k) = a k mod M. Shor’s quantum algorithm attacks the problem of 
efficiently finding the period of a function. 

8.2 Shor's Factoring Algorithm 

Before giving the details of Shor’s factoring algorithm in sections 8.2.1 and 8.2.2, we give a 
high-level outline. Quantum computation is required only for parts 2 and 3; the other parts would 
most likely be carried out on a classical computational device, 

1. Randomly choose an integer a such that 0 < a < M. Use the Euclidean algorithm to determine 
whether a and M are relatively prime. If not, we have found a factor of M. Otherwise, apply the 
rest of the algorithm. 

2. Use quantum parallelism to compute f(x) = a x mod M on the superposition of inputs, and 
apply a quantum Fourier transform to the result. Section 8.2.2 shows that it suffices to consider 
input values x e {0,..., 2" — 1}, where n is such that M 2 < 2" < 2 M 2 . 

3. Measure. With high probability, a value v close to a multiple of y will be obtained. 

4. Use classical methods to obtain a conjectured period q from the value v. 

5. When q is even, use the Euclidean algorithm to check efficiently whether a q ^ 2 + 1 (or a q ^ 2 — 1) 
has a nontrivial common factor with M. 


6. Repeat all steps if necessary. 
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Sections 8.2.1 and 8.2.2 describe Shor’s algorithm in more detail. Section 8.3 runs through an 
example with specific values of M and a. 


8.2.1 The Quantum Core 

After using quantum parallelism to create the superposition \x, fix)), part 2 of Shor’s 
algorithm applies the quantum Fourier transform. 

Since f(x) = a x mod M can be computed efficiently classically, the results of chapter 6 imply 
that the transformation 


U f : \x)\0) -> \x)\f(x)) 

has an efficient implementation. (We discuss the efficiency of the entire algorithm in section 8.4.) 
We use quantum parallelism with Uf to obtain the superposition 



2"-l 

J2 wi /(*»■ 

x=0 


( 8 . 1 ) 


The analysis simplifies slightly if we now measure the second register. Section 8.5 shows 
how the measurement can be omitted without affecting the efficiency or the result of the 
algorithm. 

Measuring the second register randomly returns a value u for fix) and the state becomes 


C'Y^,g{x)\x)\u), 

X 

where 


g(x) = 


1 if f(x) = u 
0 otherwise, 


( 8 . 2 ) 


and C is the appropriate scale factor. The value of u is of no interest and, since the second register 
is no longer entangled with the first, we can ignore it. Because the function fix) — a x mod M 
has the property that fix) — fiy) if and only if x and y differ by a multiple of the period, the 
values of x that remain in the sum, those with gix) f 0, differ from each other by multiples of the 
period. Thus, the function g has the same period as the function /. If we could somehow obtain 
the value of two successive terms in the sum, we would have the period. Unfortunately, the laws 
of quantum physics permit only one measurement from which we can obtain only one random 
value of x. Repeating the process does not help because we would be unlikely to measure the 
same value u of fix), so the two values of x obtained from two runs would have no relation to 
each other. 

Applying the quantum Fourier transform to the first register of this state produces 


U F iCY J Six)\x)) = C'Y J Gic)\c), 


(8.3) 
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where G(c) = g(x ) exp( 2 ^n X ). The analysis of section 7.8.2 tells us that when the period 

r of the function g(x) is a power of two, G(c) — 0 except when c is a multiple of 2"/r. When 
the period r does not divide 2 ", the transform approximates the exact case, so most of the amplitude 
is attached to integers close to multiples of y. For this reason, measurement yields, with high 
probability, a value v close to a multiple of y. The quantum core of the algorithm has now 
been completed. The next section examines the classical use of v to obtain a good guess for the 
period. 

8.2.2 Classical Extraction of the Period from the Measured Value 

This section sketches a purely classical algorithm for extracting the period from the measured 
value v obtained from the quantum core of Shor’s algorithm. When the period r happens to be a 
power of 2, the quantum Fourier transform gives exact multiples of 2" /r, which makes the period 
easy to extract. In this case, the measured value v is equal to j 2 " for some j. Most of the time j 
and r will be relatively prime, in which case reducing the fraction ^ to its lowest terms will yield 
a fraction J - whose denominator is the period r. The rest of this section explains how to obtain a 
good guess for r when it is not a power of 2. 

In general the quantum Fourier transform gives only approximate multiples of the scaled fre¬ 
quency, which complicates the extraction of the period from the measurement. When the period is 
not a power of 2, a good guess for the period can be obtained from the continued fraction expansion 
of ^ described in box 8.1. Shor shows that with high probability v is within | of some multiple 
of y , say j —. The reason why n was chosen to satisfy M < 2" < 2 M~ becomes apparent when 
we try to extract the period r from the measured value v. In the high-probability case that 

2" 1 

/— < ~ 
r 2 

for some j, the left inequality M 2 < 2" implies that 

v j 1 1 

- - < - < - 

2" r 2-2" ~ 2 M 2 

In general, the difference between two distinct fractions ^ and y with denominators less than M 
is bounded: 

p_ _ p^_ _ pq’ ~ p' c i J_ 

q q' qq' M 2 

Thus, there is at most one fraction - with denominator q < M such that ^ . In the 

’ q 1 2" q M 2 

high probability case that v is within ^ of j y , this fraction will be ] -. The fraction ^ can be 
computed using a continued fraction expansion (see box 8.1). We take the denominator q of 
the obtained fraction, as our guess for the period. This guess will be correct whenever j and r are 
relatively prime. 
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Box 8.1 

Continued Fraction Expansion 


The unique fraction with denominator less than M that is within of ^ can be obtained efficiently 
from the continued fraction expansion of ^ as follows. Let [x] be the greatest integer less than x. 
Using the sequences 

«-[*] 

V 

e o = ^,~ a o 

r i i 


i 

f i — a i 

U-l 

PO = «0 

Pl = ciiciq + 1 

Pi = aiPi—i + Pi-2 

qo = 1 

qi = a x 

qi = atqt-i +qi-2, 

compute the first fraction such that q, < M < q,_ |_j. 


8.3 Example Illustrating Shor's Algorithm 

This section illustrates the operation of Shor’s algorithm as it attempts to factor the integer M — 21. 
Since M 2 — 441 < 2 9 < 882 = 2M 2 , take n — 9. Since [log M] = m — 5, the second register 
requires five qubits. Thus, the state 

1 29 -‘ 

(8.4) 

x=0 

is a 14-qubit state, with nine qubits in the first register and five in the second. 

Suppose the randomly chosen integer is a = 11 and that quantum measurement of the second 
register of the superposition of equation 8.1 
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1 


2 9 —1 

J2 wi/(*)> 

x=0 


(8.5) 


produces u — 8 . The state of the first register after this measurement is shown in figure 8.1, which 
clearly shows the periodicity of /. 

Figure 8.2 shows the result of applying the quantum Fourier transform to this state; it is the 
graph of the fast Fourier transform of the function shown in Figure 8 .1. In this particular example, 
the period of / does not divide 2 ", which is why the probability distribution has some spread 
around multiples of 2 "/r instead of having a single spike at each of these values. 

Suppose that measurement of the state returns v = 427. Since v and 2" are relative prime, we 
use the continued fraction expansion of box 8.1 to obtain a guess q for the period. The following 
table shows a trace of the continued fraction algorithm: 


i 

di 

Pi 

<7/ 

e, 

0 

0 

0 

1 

0.8339844 

1 

1 

1 

1 

0.1990632 

2 

5 

5 

6 

0.02352941 

3 

42 

211 

253 

0.5 


The algorithm terminates with 6 = q 2 < M < < 73 . Thus, q — 6 is our guess for the period of 

/• 

Since 6 is even, a 6 / 2 — 1 = ll 3 — 1 = 1330 and a 6 ? 2 + 1 = ll 3 + 1 = 1332 are likely to have 
a common factor with M. In this particular example, gcd(21, 1330) = 7 and gcd(21, 1332) = 3. 


8.4 The Efficiency of Shor's Algorithm 

This section considers the efficiency of Shor’s algorithm, examining both the efficiency of each 
part in terms of the number of gates or classical steps needed to implement the part and the 
expected number of times the algorithm would need to be repeated. 

The Euclidean algorithm on integers x > y requires at most Of log x) steps, so both parts 1 
and 5 require 0(log M ) = 0(m ) steps. The continued fraction algorithm used in part 4 is related 
to the Euclidean algorithm and also requires 0(m) steps. Part 3 is a measurement of m qubits or, 
as section 8.5 shows, can be omitted altogether. Part 2 consists of the computation of Uf and the 
computation of the quantum Fourier transform. Section 7.8.2 showed that the quantum Fourier 
transform on m qubits requires O(m) steps. The algorithm for modular exponentiation given in 
section 6.4 requires 0(n 3 ) steps could be used to implement Uf. The transformation Uf can be 
implemented more efficiently using an algorithm for modular exponentiation, described by Shor, 
that is based on the most efficient classical method known, and runs in O ( n 2 log n log log n ) time 
and 0(n logn log log n ) space. These results show that the overall runtime of a single iteration 
of Shor’s algorithm is dominated by the computation of Uf , and that the overall time complexity 
for a single iteration of the algorithm is 0{n 2 log n log log n). 
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To show that Shor’s algorithm is efficient, we also need to show that the parts do not need to 
be repeated too many times. Four things can go wrong: 

• The period of fix) — a x mod M could be odd. 

• Part 4 could yield M as M’s factor. 

• The value v obtained in part 3 might not be close enough to a multiple of y . 

• A multiple j y of y is obtained from v, but j and r could have a common factor, in which case 
the denominator q is actually a factor of the period, not the period itself. 

The first two problems appear in the classical reduction, and standard classical arguments 
bound the probabilities as at most 1 /2. For the case in which the period r divides 2", problem 3 
does not arise. Shor shows that, in the general case, v is within 1/2 of a multiple of y with high 
probability. As for problem 4, when r divides 2", it is not hard to see that every outcome v = j y 
is equally likely: the state after taking the quantum Fourier transform is 

2 n — \ 

C' £ G(c)|c>, 


G{c) = exp(2jri—) = J]exp(2jri-y) 

xeX u y =0 

where X u — [x\f (x) — u). As we mentioned in section 7.8.1, the final sumis 1 when c is a multiple 
of 2 n /r, and 0 otherwise. Thus, in this case, any j e {0,,.., r — 1} is equally likely. From j, 
we obtain the period r exactly when r and j are relatively prime, gcd(r, j) — 1. The number of 
positive integers less than r that are relatively prime to r is given by the famous Euler f function, 
which is known to satisfy f (r) >8/ log log r for some constant 8. Thus we need to repeat the parts 
only O (log log r) times in order to achieve a high probability of success. The argument for the 
general case in which r does not divide 2" is somewhat more involved but yields the same result. 

8.5 Omitting the Internal Measurement 

Part 3 of Shor’s algorithm, the measurement of the second register of the state in equation 8.1 to 
obtain u, can be skipped entirely. This section first describes the intuition for why this measurement 
can be omitted and then gives a formal argument. 

If the measurement is omitted, the state consists of a superposition of several periodic functions, 
one for each value of fix), all of which have the same period. By the linearity of quantum trans¬ 
formations, applying the quantum Fourier transformation leads to a superposition of the Fourier 
transforms of these functions. The different functions remain distinct parts of the superposition 
and do not interfere with each other because each one corresponds to a different value u of the 
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second register. Measuring the first register gives a value from one of these Fourier transforms, 
which as before will be close to j y for some j and so can be used to obtain the period in the same 
way as before. Seeing how this argument can be formalized illustrates some of the subtleties of 
working with quantum superpositions. 

Let X u — \x\ f(x) — u}. The state of equation 8.1 can be written as 



2"-l 


E 


Ml fix)} 



EE Ml u) 

ueR x^Xu 


= - 7 = E (E ^mm) i«). 


where R is the range of fix) and g u is the family of functions indexed by u such that 


gu (x) — 


1 if f(x) — u 
0 otherwise. 


The amplitudes in states with different u in the second register can never interfere (add or cancel) 
with each other. The result of applying the transform Up® I to the preceding state can be written 

7=e(Es«mw) |m) 

'2 n — \ 

c) 

ueR \ 

where G„(c) is the discrete Fourier transform of g u (x). This results is a superposition of the 
possible states of equation 8.3 over all possible u. Since the g u all have the same period, measuring 
the first part of this state returns a c close to a multiple of 2"/r, just as happened when the second 
register was measured as part of the original algorithm. 





8.6 Generalizations 


Shor’s original paper contained not only a quantum factoring algorithm, but also a related algo¬ 
rithm for the discrete logarithm problem. Further generalizations of Shor’s quantum algorithms 
have been obtained for problems falling in the general class of hidden subgroup problems. 
The next two sections, sections 8.6.1 and 8.6.2, require knowledge of group theory. Read¬ 
ers unfamiliar with group theory should just skim these sections; the results they contain 
will not be used later in the book, apart from appendix B and the section of the final chap¬ 
ter that reviews more recent algorithmic results. The basics of group theory are reviewed in 
boxes. 
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8.6.1 The Discrete Logarithm Problem 

The discrete logarithm problem is also of cryptographic importance; the security of Diffie-Hellman 
and El Gamal and elliptic curve public key encryption, for example, rest on the classical diffi¬ 
culty of this problem. In fact, all standard public key encryption systems and digital signature 
schemes are based on either factoring or the discrete logarithm problem. Electronic commerce 
and communication rely on public key encryption and digital signature schemes for their secu¬ 
rity and efficiency. It is currently unclear whether a public key encryption system believed to be 
secure against classical and quantum attacks can be established before quantum computers are 
built. If quantum computers win this race, the practical implications will be substantial. Once 
quantum computers become a reality, all currently accepted public key encryption systems will 
be completely insecure. 

Let Z* be the group of integers {1, ..., p — 1} under multiplication modulo p, and let b be 
a generator for this group (any b relatively prime to p — I will do). The discrete logarithm of 
ye Z* with respect to base b is the element x e Z* such that b x — y mod p. 

Discrete Logarithm Problem Given a prime p, a base b e Z* , and an arbitrary element y e Z* , 
find an x e Z* such that b x — y mod p. 

For large p, this problem is computationally difficult to solve. The discrete logarithm problem 
can be generalized to arbitrary finite cyclic groups G, though for some large G it is is not difficult 
to solve classically. The discrete logarithm is a special case of the Abelian hidden subgroup 
problem. Appendix B describes a general algorithm for the Abelian hidden subgroup problem 
that yields essentially Shor’s original discrete logarithm algorithm in the special case. The next 
section discusses hidden subgroup problems. 

8.6.2 Hidden Subgroup Problems 

The hidden subgroup framework subsumes many of the problems and quantum algorithms we have 
discussed. Understanding this framework requires experience with group theory. The definition 
of a group is reviewed in box 8.2, which also contains examples. Box 8.3 defines some properties 
of groups and subgroups. Box 8.4 discusses Abelian groups. 

The Hidden Subgroup Problem Let G be a group. Suppose a subgroup H < G is implicitly 
defined by a function / on G in that / is constant and distinct on every coset of H. Find a set of 
generators for H. 

The aim is to find a polylogarithmic algorithm that computes a set of generators for H in 
0((log|G|) A ) steps for some k. The difficulty of the problem depends not only on G and F but 
also on what is meant by given a group G. Some useful properties may be expensive to compute 
from certain descriptions of a group and immediate from others. For example, computing the size 
of a group from certain types of descriptions, such as a defining set of generators and relations, 
is known to be computationally hard. Also, we can hope to find a solution in poly-log time only 
if / itself is computable in poly-log time. 
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Box 8.2 

Groups 

A group is a non-empty set G with an associative binary operation, denoted o, satisfying 

• ( closure ) for any two elements gj and g 2 of G, the product g\ o g 2 is also in G, 

• an identity element e e G such that eog = goe = g, and 

• every element g e G has an inverse g ~ 1 e G such that g o g _ 1 = g _ 1 og = e. 

The associative binary operation o is generally referred to as the group's product. Often the product 
is indicated simply by juxtaposition, with the o omitted: g j o g 2 is written simply as gj g 2 . For some 
groups, other notation is used for the binary operation. 

Some examples of groups: 

• The integers {0, 1, ..., n — 1} form a group under addition modulo n. This group is denoted Z„, 
with binary operator +. 

• The set of /r-bit strings, Z*. forms a group under bitwise addition modulo 2. 

• For p prime, the set of integers {1, 1} forms a group Z* under multiplication modulo p. 

• The set lA(ri) of all unitary operators on an n -dimensional vector space V forms a group. 

• The Pauli group consisting of the eight elements ±7, ±X, ±T, and ±Z forms a group. 

• The extended Pauli group consisting of the sixteen elements col , coX, coY , and coZ, where 
co € {1, — 1, —i, ij, forms a group. 


Box 8.3 

Properties of Groups and Subgroups 


The number of elements | G \ of a group is called its order. A group is said to be finite if its order is a 
finite number; otherwise it is an infinite group. 

A subset 77 of G that is a group in its own right, under the restriction of G 's product to 77, is called a 
subgroup of G. The subgroup relation is written 77 < G. For example, for any integer m dividing n , the 
set of multiples of m forms a subgroup of Z„. Also, any subspace W of a vector space V is a subgroup 
of the group V under vector addition. The Pauli group is a subgroup of the unitary group U(n). 

The order of an element g is the size of the subgroup of G that it generates. The order of an element 
must divide the order of a group. 

A set of generators of a group G is a subset of G such that all elements of G can be written as a finite 
product of the generators and their inverses (in any order and allowing repeats). A set of generators 
of a group is independent if no generator can be written as a product of the other generators. A group 
is, finitely generated if a finite set of generators exists. If a group can be generated by a single element 
it is cyclic. The set of generators for a given group is not unique in general. 

The centralizer , Z(77), of a subgroup 77 of G is the set of elements of G that commute with all 
elements of 77: 

Z(77) = (g e G\gh — hg for all h e 77). 

For 77 < G, the centralizer Z(H) of H is a subgroup of G. 
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Box 8.4 

Abelian Groups 


A group is Abelian if its group product o is commutative: gi o g 2 = g2° 81- 
The group Z„ is Abelian, but the set of unitary operators U(n) is not Abelian. 

The product G x H of two groups G and H, with products oq and o ^ respectively, is the set of 
pairs {(g, h)\g e G,h e H } with the product (gi ,h i) o (g 2 , h 2 ) = (gi o G g 2 , /7 1 o H hi). 

The structure of finite Abelian groups is well understood. Every finite Abelian group is isomorphic 
to a product of one or more cyclic groups Z nj . For example, for the product n of two relatively 
prime integers p and q , the group Z n is isomorphic to Zp xZ ? . Any finite Abelian group A has a 
unique decomposition (up to the ordering of the factors) into cyclic groups of prime power order. 
The decomposition depends only on its order |A|. Let |A| = n,c,- be the prime factorization of | A|, 
where c, = pf and the pj are distinct primes. Then 

A = Z C1 x Z C2 x ■ ■ ■ x Z Ck . 


While the general hidden subgroup problem remains unsolved, a polylogarithmic bounded 
probability quantum algorithm for the general case of finite Abelian groups, specified in terms of 
their cyclic decomposition, exists. The cyclic decomposition for Abelian groups is described in 
box 8.4. 

Finite Abelian Hidden Subgroup Problem Let G be a finite Abelian group with cyclic decompo¬ 
sition G — Z„ 0 x • • • x Z„ L . Suppose G contains a subgroup H < G that is implicitly defined by 
a function / on G in that / is constant and distinct on every coset of H. Find a set of generators 
for H. 


Example 8.6.1 Period-finding as a hidden subgroup problem. Period-finding can be rephrased 
as a hidden subgroup problem. Let / be a periodic function on Z v with period r that divides 
N. The subgroup H < Z,v generated by r is the hidden subgroup. Once a generator h for H 
has been found, the period r can be found by taking the greatest common divisor of h and N: 
r = gcd (h, N). 


In addition to period-finding, both Simon’s problem and the discrete logarithm problem 
are instances of the finite Abelian hidden subgroup problem. Recognizing how Simon’s prob¬ 
lem can be viewed as a hidden subgroup problem is relatively easy. Understanding how the 
discrete logarithm problem is a special case of the hidden subgroup problem requires some 
ingenuity. 


Example 8.6.2 The discrete logarithm problem as a hidden subgroup problem. The discrete log 
problem asks: Given the group G = Z*, where p is prime, a base b e G, and an arbitrary 
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element y e G, find an x e G such that b x = y mod p. Consider / : G x G —> G where 
f(g, h) — b~ g y h . The set of elements satisfying f(g, h) — 1 is the hidden subgroup H of G x G 
consisting of tuples of the form {mx, m). From any generator of //. the element (x , 1) can be 
computed. Thus, solving this hidden subgroup problem yields x, the solution to the discrete 
logarithm problem. 


A crucial ingredient of Shor’s algorithm is the quantum Fourier transform. The quantum algo¬ 
rithm for Simon’s problem also uses a quantum Fourier transform; quantum Fourier transforms 
can be defined for all finite Abelian groups (and more generally all finite groups), and the quantum 
Fourier transform for the group Z" is the Walsh-Hadamard transformation W. The solution to the 
hidden subgroup problem over an Abelian group G uses the quantum Fourier transform over the 
group G. The Fourier transformation over an general finite group G is defined in terms of 
the group representations of G. These ingredients are described in appendix B, which also 
describes the general solution to the finite Abelian hidden subgroup problem. It makes use of 
deeper group theory results than the rest of the book. No one knows how to solve the hid¬ 
den subgroup problem over general non-Abelian groups. What progress has been made toward 
understanding the non-Abelian hidden subgroup problem is discussed in chapter 13. 

8.7 References 

Lenstra and Lenstra [193] describes the best currently known classical factoring algorithm, 
the number field sieve, including its Cfexpfr? 1 ^ 3 )) complexity. Some simpler but less efficient 
classical factoring algorithms are described in Knuth [182]. 

Shor’s algorithm first appeared in 1994 [250]. Shor later published an expanded version [253] 
that contains a detailed analysis of the complexity and the probability of success. 

The continued fraction expansion, and the approximations it gives, is described in detail in 
most standard number theory texts including Hardy and Wright [149]. Its efficiency and relation 
to the Euclidean algorithm is discussed in Knuth [182]. The Euler </> function and its properties 
are also discussed in standard number theory books such as Hardy and Wright [149]. 

Kitaev solved the general finite Abelian hidden subgroup problem [172], Jozsa [165] and 
[112] provide accessible accounts of the quantum Fourier transform in the context of the hidden 
subgroup problem. The general hidden subgroup problem was introduced by Mosca and Ekert in 
[214], 

Koblitz and Menezes, in their 2004 survey [183], give a detailed overview of proposed public 
key encryption schemes, including ones not based on factoring or the discrete logarithm problem, 
as well as the more standard public key schemes. Rieffel [242] discusses the practical impli¬ 
cations of quantum computing for security. There are conferences in the field of post-quantum 
cryptography. The book Post-Quantum Cryptography [47] contains a compilation of papers on 
the implications of quantum computing for cryptography and overviews of some of the more 
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promising directions. Perlner and Cooper [228] survey public key encryption and digital signa¬ 
ture schemes that are not known to be vulnerable to quantum attacks and discuss design criteria 
that need to be met if such a system were to be deployed in the future. 

8.8 Exercises 

Exercise 8.1 . Give the exact value of the scale factor C in equation 8.2 in terms of properties of 
/ and u. 

Exercise 8.2. Show that with high probability v, the value obtained from the quantum core of 
Shor’s algorithm described in section 8.2.1 is within i of some multiple of 2 ", 

Exercise 8.3. Determine the efficiency of Shor’s algorithm in the general case when r does not 
divide 2". 

Exercise 8.4. Show that the probability that the period of f(x) — a x mod M is odd is at most 

1 / 2 . 

Exercise 8.5. Show that in the general case in which r does not divide 2", the parts of Shor’s 
algorithm need to be repeated only O (log log r) times in order to achieve a high probability of 
success. 

Exercise 8.6. Explain how Deutsch’s problem of section 7.3.1 is an instance of the hidden 
subgroup problem. 

Exercise 8.7. Explain how Simon’s problem is an instance of the hidden subgroup problem. 
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Grover's Algorithm and Generalizations 


Grover’s algorithm is the most famous algorithm in quantum computing after Shor’s algorithm. 
Its status, however, differs from that of Shor’s in a number of respects. Shor’s algorithm solves a 
problem with clear practical consequences, but its application is focused on a narrow, if important, 
range of problems. In contrast, Grover’s algorithm and its many generalizations can be applied to 
a broad range of problems, although, as section 9.6 explains, there is debate as to how far-reaching 
the practical implications of Grover’s algorithm and its generalizations are. 

Grover’s algorithm solves a black box problem. It succeeds in finding a solution with 0(*/N) 
calls to the oracle, whereas the best possible classical approaches require O ( N ) calls. Thus, unlike 
Shor’s algorithm, Grover’s algorithm is provably better than any possible classical algorithm. This 
query complexity improvement over the classical case translates to a speedup only under certain 
conditions; it depends on the efficiency with which the black box can be implemented, and on 
whether there is additional structure to the problem that can be exploited by classical and quantum 
algorithms. This issue will be discussed in section 9.6. Even when the query complexity result 
translates to a time complexity improvement, the speedup is much less than for Shor’s algorithm. 

The 0(\/~N) query complexity of Grover’s algorithm is known to be optimal; no quantum 
algorithm can do better. This restriction is as important as the algorithm itself. It places a severe 
restriction on the power of quantum computation. Although Grover’s algorithm is usually pre¬ 
sented as succeeding with high probability, unlike for Shor’s algorithm, variations that succeed 
with certainty are known. Grover’s algorithm is simpler and easier to grasp than Shor’s, and has 
an elegant geometric interpretation. 

Section 9.1 describes Grover’s algorithm and determines its query complexity. Section 9.2 
covers amplitude amplification, a generalization of Grover’s algorithm. It also provides a simple 
geometric view of the algorithm. The optimality of Grover’s algorithm is proved in section 
9.3. Section 9.4 shows how to derandomize Grover’s algorithm while preserving its efficiency. 
Section 9.5 generalizes Grover’s algorithm to handle cases in which the number of solutions is not 
known. Section 9.6 discusses black box implementability, explains under what circumstances the 
query complexity results translate into a speedup, and evaluates the extent of practical potential 
applications for Grover’s algorithm. 
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9.1 Grover's Algorithm 

Grover’s algorithm uses amplitude amplification to search an unstructured set of N elements. 

The problem is usually stated in terms of a Boolean function, or predicate, P : {0. N — 1} —> 

{0, 1} that captures the property being searched for. The goal of the problem is to find a solution, an 
element x with P(x) = 1. As in Simon’s problem and the Deutsch-Jozsa problem, the predicate 
P is viewed an oracle, or black box, and we will concern ourselves with the query complexity, 
the number of calls made to the oracle P. Given a black box that outputs P(x) upon input of x, 
the best classical approaches must, in the single solution case, inspect an average of N/2 values; 
it requires an average of N/2 evaluations of the predicate P(x). Given a quantum black box Up 
that outputs 

|*> |/•(*)) 

* 

upon input of 

a: 

Grover’s algorithm finds a solution with only 0(*/N) calls to Up in the single solution case. 
Grover’s algorithm iteratively increases the amplitudes c x of those values x with P(x) = 1, so 
that a final measurement will return a value x of interest with high probability. For practical 
applications of Grover’s algorithm, the predicate P must be efficiently computable, but without 
enough structure to enable classical methods to gain on the quantum algorithm. 

9.1.1 Outline 

Grover’s algorithm starts with an equal superposition \fi) = —= \ x ) of all N values of the 

search space and repeatedly performs the same sequence of transformations: 

1. Apply Up to \fi). 

2. Flip the sign of all basis vectors that represent a solution. 

3. Perform inversion about the average, a transformation that maps every amplitude A — S to 
A + S, where A is the average of the amplitudes. 

For the case of a single solution, figure 9.1 illustrates how these steps increase the amplitude of 
the basis vector of a solution. We now look at this process in detail. 

9.1.2 Setup 

Without loss of generality, let N = 2" for some integer n, and let X be the state space generated 
by {10},..., |N — 1)}. Let Up be a quantum black box that acts as 

Up : \x, a) -> \x, P(x) ©«}, 

for all x e X and single-qubit states \a). 
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a) 


b) 


Figure 9.1 

The iteration step of Grover’s algorithm is achieved by (a) changing the sign of the good elements and (b) inverting about 
the average. The case of a single solution is illustrated. 


Let G = {xlL’(x)} and B — {jc| —■ / 3 (x)} denote the good and bad values respectively, and let 
the number of good states be a small fraction of the total number of states: 

|G| « N. 


Let 

' fc) = 7W& 

be an even superposition of all the good states, and 

" / " ) = 7w£ w 

be an even superposition of the bad ones. Then |t/r> = W|0>, an equal superposition of all N 
values, can be written as a superpositions of |i Jtq) and \\j/ B ) 

2 "-l 


1 

\t) = —j= V W = goWG) + bo\if B ) 

V2" ^ 


x=0 


where go — j\G\/N and bo = y/\B\/N. 

The core of Grover’s algorithm is the repeated application of a unitary transformation 


Q ■ +bi\f B ) gi+iWc) +b i+l \if B ) 

that increases the amplitude gj of good states (and decreases b, ) until a maximal value is reached. 
After applying the amplitude amplifying transformation Q an appropriate number of times j, 
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almost all amplitude will have shifted to good states, so that \bj\ \gj |. At this point, measure¬ 
ment will return an x e G with high probability. The exact number of times Q needs to be applied 
is on the order of \Tn and depends on both N and | G \. Section 9.1.4 presents a detailed analysis. 

9.1.3 The Iteration Step 

The transformation Q is achieved by changing the sign of the good elements and then inverting 
about the average. This section describes the implementation of these two steps in detail. Both 
steps take real amplitudes to real amplitudes, so we will refer only to real amplitudes throughout 
the argument. 

Changing the Sign of the Good Elements To change the sign in a superposition ^ c x \x) of exactly 
those \x) such that x e G, apply Sq . A sign change is simply a phase shift by e m — — 1 . Section 
7.4.2 showed that 

Changing the sign of the good elements is accomplished by 

Up : (gi\ifa) + bi\f B ))®H\\) -» (~gi\f G ) +bi\ir B )) <g> i/|l>. 

The number of gates needed to change the sign on the good elements does not depend on N , 
but rather on how many gates it takes to compute Up. 

Inversion About the Average Inversion about the average sends a\x) to (2 A — a)\x) where A is 
the average of the amplitudes of all basis vectors in the superposition. (See figure 9.1.) It is easy 
to see that the transformation 

N -1 N -1 

'Y^a i \x i ) -> ■)!*,) 

1=0 1=0 

is performed by the unitary transformation 



This paragraph shows how to implement this transformation with 0(n ) = 0(log 2 ( N)) quan¬ 
tum gates. Following Grover, define D — —WS^W, where W is the Walsh-Hadamard transform 
and 
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is the phase shift by jt of the basis vector 10) described in section 7.4.2. To see that D — — WSq W, 
let 

/ 2 0 ... 0 \ 

000 ... 

R - 0 . o • 

\ 0 ... 0 0 / 

Since Sq — I — R, 

-WS%W = W(R — I)W = WRW-I. 

Since R G = 0 for i ^ 0 or j / 0, 

(WRW)ij - WioRooWoj = 4 

N 

and -WSqW = WRW-I = D. 

Putting inversion about the average together with changing the sign of the good elements yields 
the iteration transformation 

Q = -WS*WS n G . 

9.1.4 How Many Iterations? 

This section examines the result of multiple application of the iteration step Q , which combines 
changing the sign and inverting about the average, in order to determine the optimal number of 
times to apply Q. It shows that Q is a fixed rotation and that the amplitude gj of good states varies 
periodically with the number of iterations. To find a solution with high probability, the number 
of iterations i must be chosen carefully. To determine the correct number of iterations to use, we 
describe the result of applying Q in terms of recurrence relations on g, and b ;. 

The iteration step Q = DS £ transforms gi\is G ) + bj\f B ) to g,+i \i/f G ) + £>;+i Wb)- First, 

SI : gi\tyG) + bi\f B ) -* -gi\^G) + bi\ir B ). 

To compute the average amplitude. A,, the term —g, |\// G ) contributes |G| amplitudes 
~gj 

V\G\ 

and bj\ij/n) contributes |Z?| amplitudes 
bj 

VW\' 

Thus, altogether 

4 _ V\B\bi~V\G\gi 



182 


9 Grover's Algorithm and Generalizations 


Inversion about the average transforms 


D : -gi\'l'G) + b i \f B ) -> £ 2A t + 


gi 






£ ( 2A < 


xeB 




= (2AiJ\G\+g i M G ) + (2A i J\B\-b i M B ) 


= gi+il^c) + b i+ i\^ B ) 


where 


gi +1 = 2A,Y|Gj+gM 

*« +1 =2A /> /jfii-6 / . 

Let f be the probability that a random value in {0, ..., N — 1} satisfies P. Then t = \G\/N 
and 1 -t = \B\/N. Then 


A./IBI = l g l ll '-V|g||G Jgl = (l _ f)fc . _ V ’ f(l _, )S i. 


The recurrence relation can be written in terms of t : 


gi+i = (1 - 2 t)gi + 2 y/t(l-t)bj, 

b i+ 1 = (1 - 2t)b, - 2 Vr(l-f)g; 

where go = an d i>o = \/l — t. It is easy to verify that 


gi = sin ((2/ + 1)0) 
bi = cos((2 i + 1)0) 

is a solution to these equations with sin0 = ~/t = y/\G\/N. 

We are now ready to compute the optimum number of iterations of Q. To maximize the 
probability of measuring a good state, and thus finding an element with the desired prop¬ 
erty P, we wish to choose i such that sin((2i + 1)0) 1 or (2 i + 1)0 ~ ?r/2. For |G| N 
the angle 0 becomes very small and *J\G\/N = sin0 0. Thus, g ,• will be maximal for i & 

*Jn/\g\. 

Additional iteration will reduce the success probability of the algorithm. This situation is in 
contrast to many classical algorithms in which the greater the number of iterations the better the 
results. Using the equations for g, and b for t = 1/4, the optimum number of iterations is 1, and 
for t — 1/2, no amount of iteration will improve the situation. 
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Since every step of the iteration process has been written as a linear combination of \\jra) 
and |i fr B ) with real coefficients, Grover’s algorithm can be viewed as acting in the real two- 
dimensional subspace spanned by | \[rc) and | \lr B ). The algorithm simply shifts amplitude from 
\ir B ) to |i //g). This picture leads to an elegant geometric interpretation of Grover’s algorithm 
discussed in section 9.2.1. First, we describe a generalization of Grover’s algorithm, amplitude 
amplification, to which this geometric picture also applies. 

9.2 Amplitude Amplification 

The first step of Grover’s algorithm applies the iteration operator Q — — WSq W S g to the initial 
state W 10). We can look at W as a trivial algorithm that maps 10) to all possible values and thus to a 
solution with probability |G|/iV. Suppose we have an algorithm U such that £/|0) gives an initial 
solution with a higher probability. This section shows that the analysis of 9.1.4 generalizes directly 
to any algorithm U such that t/|0) has some amplitude in the good states. Amplitude amplification 
generalizes Grover’s algorithm by replacing the iteration operator Q — — WW S G with 

Q = -US”U~ l Sl. 

The rest of this section generalizes the argument of section 9.1.4 to obtain the same recurrence 
relations for this more general case. 

Let Q and B be the subspaces spanned by {|x}|x e G} and {|x)|x ^ G} respectively, and let 
Pg and Ptf be the associated projection operators. Let |i jr) — GjO) be written as 

W = 8o\^G) + b 0 \^B) 

where l^c) and \ are the normalized projections of IV'”} onto the good and bad subspaces, 

\^g) = —Pg\f), 

8o 

and 

Wb) = j-PbW* 
bo 

with 

go = I Pg I> I > 
and 

bo = \PbW)\- 

For U — W , liAcK IV'fl), go, and bo are as in section 9.1.4. Here go and bo are not determined by 
the number of solutions, but rather by the properties of U relative to the good states. The states 



184 


9 Grover's Algorithm and Generalizations 


\ifrc) and | i/sb) need not be equal superpositions of the good and bad states respectively, but go 
and bo are still real. Again, we let t — gg with 1 — t = bg, where t should be thought of as the 
probability that measurement of the superposition C/| 0 ) yields a state that satisfies predicate P. 
The operator U can be viewed as a reversible algorithm that maps |0) to a set of solutions in G 
with a probability t — [goT- 

To understand the effect of Q = — USqU~ 1 S g , recall from section 7.4.2 that can be 

written as \cp) — 2(0|^}|0}. For an arbitrary state | \[r), 

US^U~ l \f) = U (U- l W)-2(0\U~ 1 m0)) 

= W)-2(0\U- 1 W)U\0) 

= W)-2W\U\0)U\0). 

Since S G \i/r G ) = ~\fc) and S n B \^ B ) = \f B ), 

QWg) = ~U SgU~ l S G \\lr G ) 

= USgU-'Wo) 

= \ fo) ~2g^U\0) 

= \ fG)-2gogo\fG)-2gobo\f B ) 

= {l-2t)\f G )-2jt{\-t)\f B ) 

and 

Q\iks) = -\f B ) + 2b~ 0 U\0) 

= -\ir B ) + 2b 0 go\\l/ G ) +2b 0 b 0 \ir B ) 

= -Mb) + 2(1 - df%G> + 2(1 - t)\rfr B ) 
bo 

= (\-2tM B ) + 2 s /t(\-t)\f G ). 

An arbitrary real superposition of \ir G ) and \i/s B ) is transformed by Q as follows: 

Q{giW G ) + bi\f B )) 

= fed - 2t) + 2b iy /t{\-t))\f G ) + (bi(l - 2 1) - 2 gi y/t(\-t)M B ), 
which leads to the same recurrence relation as in the previous section, 

gi+i - (1 - 2 t)gi + 2^t{\ - t)bj 
b i+ i = (1 — 2t)bi — 2 v / r(l — t)gj, 


with the solution 
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gi = sin((2i + 1)0) 
bi — cos((2 i + 1)0) 
for sin0 = */t — go- 

Thus, for small go , the amplitude gi will be maximal after i ss f ^ iterations. If the algorithm 
U succeeds with probability t, then simple classical repetition of U requires an average of I /1 
iterations to find a solution. Amplitude amplification speeds up this process so that it takes only 
O(^fYpt) tries to find a solution. If U has no amplitude in the good states, go will be zero 
and amplitude amplification will have no effect. Furthermore, just as no amount of iteration in 
Grover’s algorithm improves the probability if t — 1 /2, if go is large, amplitude amplification 
cannot improve the situation. For this reason, amplitude amplification applied to an algorithm U 
that is the result of amplitude amplification does not improve the results. 

9.2.1 The Geometry of Amplitude Amplification 

The reasoning behind amplitude amplification, including the optimal number of iterations of 
Q to perform, can be reduced to a simple argument in two-dimensional Euclidean geome¬ 
try. Let hA bK and Q — —USqU~ 1 Sq be as defined before. This section shows that the 

entire discussion of amplitude amplification, and Grover’s algorithm in particular, reduces to a 
simple geometric argument about rotations in the two-dimensional real subspace generated by 

By the definition of \^g) and |i// ft ), the initial state C/|0) = go I V^c) + bo\\// b) has real amplitudes 
go and bo, so is in the two-dimensional real plane spanned by {| ^g) , | i^b) }. The smaller the success 
probability t, the closer f/|0) is to \ir B ). Let /S be the angle between t/|0) and |i J/c) illustrated 
in figure 9.2. The angle ft depends only on the probability t — g^ that the initial state U |0), if 
measured, gives a solution: cos(jS) = (iJ/g\U\ 0) = go- The rest of this section explains how each 
iteration of Grover’s algorithm rotates the state by a fixed angle in the direction of the desired 


V’g) 



Figure 9.2 

The initial state t/|0) in the basis {\^g)-> \^b)}- 
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\4>g) 



Figure 9.3 

The transformation Sq reflects i//o) about 1 1 //#), result in the state yp \) 


state. To maximize the amplitude in the good states, we iterate until the state is close to From 
the simple geometry of the situation, we can determine both the optimal number of iterations and 
the probability that the run succeeds. 

Amplitude amplification, and Grover’s algorithm as the special case when U — W , consists 
of repeated applications of Q — —USqU~ 1 Sq. To understand this transformation geometrically, 
recall from section 7.4.2 that the transformation Sq can be viewed as a reflection about the 
hyperplane perpendicular to | i/r c }. In the plane spanned by {| x/tq ), | xjr B }}, this hyperplane reduces 
to the one-dimensional space spanned by |i j/ B ). Figure 9.3 illustrates how Sq maps an arbitrary 
state |i/fo> in the {|i [tq], I'/'b}} subspace to |i/q) = Sq\xI/q). Similarly, the transformation Sq is a 
reflection about the hyperplane orthogonal to |0>. Since USqU differs from Sq by a change of 
basis, it is a reflection about the hyperplane orthogonal to I/|0). The effect of this transformation 
on |i/q) is shown in figure 9.4. The final negative sign reverses the direction of the state vector, 
shown in figure 9.5. (Strictly speaking, this negative sign is unnecessary, since it does nothing to 
the quantum state: it is a global phase change, so it is physically irrelevant. However, since we 
are drawing our pictures in the plane, not in projective space, the negative sign makes it easier to 
see what is going on.) Recall from Euclidean geometry that the concatenation of two reflections 
is a rotation of twice the angle between the axes of the two reflections. The two axes of reflection 
in this case are perpendicular to U |0) and |i/^c) respectively, so the angle between the axes of 
reflection is —/I where cos /l = go as before. The two reflections perform a rotation by —2/1, and 
the final negation amounts to a rotation by n. Thus, each step Q performs a rotation by tx — 2/1. 

Let 6 — | — /l, the angle between t/|0) and \ ^b), so sin 0 = go as it did in the analyses of the 
previous sections. Each iteration of Q rotates the state by 26, so the angle after i steps is (2 i + 1)6. 
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As before, the amplitude in the good states after i steps is given by g,- = sin((2/ + 1)0). We solve 
for the optimal number of iterations just as we did at the end of section 9.1.4. 

9.3 Optimality of Grover's Algorithm 

As important as Grover’s algorithm itself is the proof that Grover’s algorithm is as good as any 
possible quantum algorithm for exhaustive search. Even before Grover discovered his algorithm, 
researchers had proved a lower bound on the query complexity of any possible quantum algorithm 
for exhaustive search: no quantum algorithm can use fewer than Q (V N) calls to the predicate Up . 
Thus, Grover’s algorithm is optimal. This result places a severe limit on what quantum computers 
can ever hope to do. 

The exponential size of the quantum state space gives naive hope that quantum computers 
could provide an exponential speedup for all computations; popular press accounts of quantum 
computers still widely make this claim. A less naive guess would be that quantum computers can 
provide exponential speedup for any computation that can be parallelized and requires only a 
single answer output. But the optimality of Grover’s algorithm shows that even that hope is too 
optimistic; exhaustive search is easily parallelized and requires only a single answer, but quantum 
computers can provide only a relatively small speedup. This section sketches a proof of optimality 
in the case of a single solution x. The proof bounds the number of calls to the oracle Up. The 
argument generalizes to the case of multiple solutions. 

Section 7.4.2 shows how S* can be computed from Up . We use S* as the interface to the oracle. 
We do not lose any generality in doing so; the process of computing Sf from Up is reversible, so 
any algorithm using S* could be rewritten in terms of Up and vice versa. 

Since the oracle Up provides us with the only way to access any information about the element 
x we are searching for, an arbitrary quantum search algorithm can be viewed as an algorithm that 
alternates between unitary transformations independent of x and calls to S *; any quantum search 
algorithm can be written as 

Wk) = U k S%U k - X S* ... UiS*U 0 \0), 

where the U t are unitary transformations that do not depend on x. The argument does not change 
if we allow the use of additional qubits; we simply use I <g> S* instead of S* and, since N is now 
larger, the algorithm will be less efficient. 

It is important to recognize that the algorithm must work no matter which x is the solution. For 
any particular x, there are transformations that find x very quickly. We want an algorithm that 
finds x quickly no matter what x is. Any search algorithm worth the name must return x with 
reasonable probability for all possible values of x. We consider only quantum search algorithms 
that return x with at least probability p — I /2. It is easy for the reader to check that any value 
0 < p < 1 results in an O (\fN ) bound, just with a different constant. More formally, we will 
show that if the state | \f/f ), obtained after k steps of the form UjS*, satisfies 
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\{x\K )\ 2 > \ 

for all x , then k must be £2 

This paragraph describes the rough strategy and intuition behind the proof. The requirement 
that the algorithm work for any x means that if the oracle interface is S *, then the result of applying 
the algorithm UkS*Uk-\S* ... U\S*Uo\0) must be a state |i jr k ) sufficiently close to |x) that x 
will be obtained upon measurement with high probability. Since two elements of the standard 
basis \x) and |y} cannot be closer than a certain constant, the final states of the algorithm for 
different S* and S* must be sufficiently far apart. Since the Uj are all the same, any difference in 
the result of running the algorithm arises from calls to S n x . The algorithms all start with the same 
state £/o|0), so if we can bound from above the amount each step increases the distance between 
| I/'T } and \ifrj), then we can obtain a bound on k, the number of calls to the oracle interface S* ■ 
In other words, we want to bound from above the amount this distance can increase by applying 
UjS* to and (/,■ S* to To obtain this bound, we compare both |i jr?) and | \jfj) with 

| tp'j), the state obtained by applying Uq up through [/,- without any intervening calls to S*. We 
hrst give the details of how to use inequalities based on these ideas to prove that £2 (VAT) calls to 
the oracle are required, and then give detailed proofs of each of the inequalities. 

9.3.1 Reduction to Three Inequalities 

The proof considers the relation between three classes of quantum states: the desired result \x }, 
the state of the computation \ijf k ) after A: steps, and the state |i/q.) = C4C4 _1 ... U\Uq\0) obtained 
by performing the sequence of transformations I/,- without consulting the oracle. The analysis 
simplifies if we sometimes consider, instead of x), a phase-adjusted version of \x), namely 
\x' k ) = e id i\x), where e 10 k = {x\ip' k )/\(x\ijf k }\. The phase adjustment is chosen so that 
is positive real for all k. Since \x' k ) differs from \x) only in a phase, whenever | (xr | “ > f, we 

have a similar inequality for \x' k ), namely 

\K\K )\ 2 > 

in which case {x' k \^r k ) > -4. 

We consider the distances between certain pairs of these states: 

dkx = IIV'*) -1^)1 
ait* = ll^) -1-4)1 
Ckx = 114 ) - 1 ^) 1 - 

The proof establishes bounds involving the sum, or average, of these distances squared: 
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The reason for considering the sum, or equivalently the average, is that any generally useful 
search algorithm must efficiently find x for all possible x. The proof relies on three inequalities 
involving D k , A k , and C k , which we will prove in section 9.3.2. Before proving the inequali¬ 
ties, we describe them and show how they imply a lower bound on the number of calls to the 
oracle. 

The first inequality bounds from above A k , the average squared distance between the state 
li/rf) obtained after k steps and and the phase adjusted solution state \x' k )\ section 9.3.2 shows 
that in order to obtain a success probability of \{x\\/f k )\~ > i, the following inequality must 
hold: 

A k < 2 - V2. 

The second inequality bounds C k , the sum of the squared distances between the vector \\l/ k ) and 
all basis vectors | j), from below as long as N >4: 

Ck > 1 . 

The third inequality bounds the growth of D k , the average squared distance between \t/r k ) and 
|i//>) as k increases: 


The three quantities d kx , a kx , and c kx are related as follows: 
dkx = 11 ft) ~\tk)\ = \\ti) ~ e ' 6 * l*> + e ' 6 * M ~ \fk) I >a kx ~ c kx . 

To relate the quantities D kx , A kx , and C kx , we use the Cauchy-Schwarz inequality (see box 9.1) 
to obtain 


Box 9.1 

The Cauchy-Schwarz Inequality 


We use the Cauchy-Schwarz inequality in two forms, the general form 



and a specialization for v/ = 1 in an IV dimensional space 

J2 u i 


(9.1) 


(9.2) 
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- 77 (E °kx - 2 E ak * Ck * + E c i ) 

(E(E c **) + 77E 

X \ \ X / \ X / X 

> A*.- — 2 \[AkCk + C*. 

Making use of this inequality and the three earlier ones, we bound 4I ^J from below by a constant: 



since 1 > 2 — \fi > At-. Thus, for /V > 4 (needed for the second inequality), and taking q — 
1 — \J2 — \/2, at least k > f ViV iterations are required for a success probability of | (x | \[r k } | 2 > | 
for all x. 

We now turn to the proofs of the three inequalities. 


9.3.2 Proofs of the Three Inequalities 

The inequality for At By assumption, |{r/r^|x)| 2 > |. By the choice of phase e' e k relating \x) 
and \x' k ), 

WK) > -4. 

so 

a kx = 11 fk)~ l X «:)l 

= llV't ) | 2 — 2<At|V f t ) + ll x t> P 

< 2- V2, 

from which it follows that 

A k = — E fl *x - 2 " V ^' 
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Bound on sum squared distance to all basis vectors The terms c\ x can be bounded as follows: 

4c = Ii4)-i^)l 2 
= \^i\x)-m\ 2 

= W^k )\ 2 - e , 0 Hifk\x) - e' e i {\lr k \x) + ||x}| 2 
= 2-2Re(e w i(1'k\x)) 

>2-2|(x|^>|. 

We can now bound the average of these terms: 



where inequality 9.3 follows from the Cauchy-Schwarz inequality (box 9.1), and equation 9.4 
holds because \xfr k ) is a unit vector and {|a:}} forms a basis. Thus, the second inequality C k > 1 
holds as long as N > 4. 

As an aside, since this argument made no assumption about \ir k ), the bound on the sum of the 
distances to all basis vectors holds for any quantum state: 


x 


2 

Vn 


for any \\//). 


The inequality for D k First, we bound how much the distance between 1 1 jr k ) and 1 1 //>) can increase 
each step. Consider the following relation between d kx and d k+ \ tX - 

4+ l,x = \\fk+l> ~ \fk+l)\ 

= \u k+l s”\r k )-u k+ xm\ 

= \s n x Wk)-\fk)\ 

= \s n x {\r k )-m)+{s*-iM k )\ 
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= d kx + 2\{x\f k )\. 

This inequality shows that with each step the distance between |i jr£) and |i/y*) can increase by at 
most 21 (x | \/f k ) I • Using this bound, we prove by induction that 


1 4k 2 

D k = -V4 < —. 

N kx — N 


Base case For k = 0, for all x, we have |t/tq ) = C/ 0 |0> = \^o)> so do x — 0 and therefore Dq = 0. 

Induction step 

1 


Dk +\ ~ N d k+i,x 


< ^ J 2 (d kx + 2 \( x \ f k )\) 2 


= ^E J ** + ^Ei (xWk )\ 2 +^ E dk * i ^ k) i 


N 


— Dk + — + — E dkx I ( x I Vb.-) I • 

X 

The Cauchy-Schwarz inequality gives 




N 


E< Eiw^i 2 = >/£■ 


Using the induction assumption D k < 4k , we have 


4 [D^ 4(k + l) 2 

D k+1 < D k H-1- 4,/ —- < -—. 

+ N \ N ~ N 


9.4 Derandomization of Grover's Algorithm and Amplitude Amplification 

Unlike Shor’s algorithm, Grover’s algorithm is not inherently probabilistic. With a little clever¬ 
ness, Grover’s algorithm can be modified in such a way that it is guaranteed to find a solution 
while still preserving the quadratic speedup. More generally, amplitude amplification can be 
derandomized. Brassard, Hpyer, and Tapp suggest two approaches. In the first, each itera¬ 
tion rotates by an angle that is slightly smaller than the one used in section 9.2.1, while the 
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second changes only the last step to a smaller rotation. This section describes each approach in 
turn. 


9.4.1 Approach 1: Modifying Each Step 

Suppose the angle 0 in Grover’s algorithm or amplitude amplification happened to be such that 
fg — \ were an integer. In this case, after i = ^ — \ iterations, the amplitude g,- would be 1 and the 
algorithm would output a solution with certainty. Recall from section 9.2 that 6 satisfies sin 0 — 
V~t — go- To derandomize amplitude amplification for algorithm U with success probability go, 
we modify U to obtain an algorithm U' with success probability g' 0 < go such that, for O' satisfying 
sin O' — g' 0? the quantity ^ is an integer. 

Intuitively, it seems as though it should not be hard to modify an algorithm U so that it is less 
successful, but we must make sure that we can compute such a U' efficiently from U. The trick is 
to allow the use of an additional qubit b. Given an algorithm U with success probability go acting 
on an n -qubit register ,v ), define U' to be the transformation U ® B on an (n + l)-qubit register 
|s}|/?}, where B is the single-qubit transformation 



Let G 1 be the set of basis states |x} ® | b) such that \x) e G and | b) = 11). The reader may check 
that the initial success probability |fVkH0}[ is indeed gg. Amplitude amplification, now on an 
(n + l)-qubit state, with U 1 for U, S for Sq, and iteration operator Q = —U'Sq(U')~ 1 S^.,, 
succeeds with certainty after i = ^ steps. 

This modified algorithm obtains a solution with certainty, using 0(J calls to the oracle, at 
the at the cost of a single additional qubit. 


9.4.2 Approach 2: Modifying Only the Last Step 

This approach is more complicated to describe, but results in a solution in time with 

certainty without the need for an additional qubit. The idea is to modify Sq and in the last 
step so that exactly the desired final state is obtained. To this end, we begin by analyzing general 
properties of transformations of the form 

Q(<j>,T) = -US*U~ 1 S r G , 


where <fi and r are both arbitrary angles and 


4 \*) 


e ir/, \x) if |x) e X 
\x) if|x) ^ X. 


Section 7.4.2 showed how to implement Sx efficiently. 
First, we show that for any quantum state |u). 


US$U~ l \v) = |u>-(l-e i0 ) (u|t/|0}[/|0). 
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Write 


N -1 

l«> = E (^W>c/k') + (^To>t/|o>. 

i=l 

Then 

( N-l 

J2wrn\i)+wmm 

i=i 

( TV— 1 

J2Tv\W)\i) + (v\UeW\0)\0) 
i =1 

W-l 

= 51 (^W)t/|i> + e i0 <i'|t/|O}t/|O> 
i=1 

Using this result, we now can see the effect of 2(0, r) = USqU~ 1 Sq on any superposition 
|u) = g\vc) + b\v B ) in the subspace spanned by |i>g) and |ug). We have 

2(0, r)|t>) = g(-e ir \v G ) +e iT (l - e^)(v G \U\0)U\0)) 

+ b(-\v B ) + (1 - e^)(v B \U\0)U\0)). 

After s — Ifg - iterations of amplitude amplification we have the state | ij/ s ) = 
sin ((2s + 1)0) 10 G ) +cos ((2s + 1)0) \ifr B ), where sin0 = *Jt — go- Applying 2(0, r) to the 
states |0 g) and \\// B ), we obtain 

2(0, r)|0 G > = e ,T ((1 -e i4, )gl - 1) |0 G > + e' z (\ - e ,>p )gob 0 \^ B )), 

2(0, r)\f B ) = (1 - e^)b Q g 0 \if G ) + (d - e^)b 2 Q - 1) |0 B ». 

So 

2(0, t)|0> = g(0, t)|0G> + b((f>, T)\fs), 
where 

g (0, t) = sin ((2s + 1)0) e 1T ((1 - e lr/> )gl - l) +cos ((2s + 1)0) (1 - e ,,p )b 0 g 0 
b(4> , r) = sin ((2* + 1)0) e iT (l - e i *)g 0 b Q + cos ((2.v + 1)0) ((1 - e^)b 2 0 - l). 

Our aim now is to show that there exist 0 and r such that if 2(0, r ) = U SqU~ 1 Sq is applied as 
a final step, a solution is obtained with certainty. 

To show that 0 and r can be chosen so that 2(0, t)| 0) has all of its amplitude in the good 
states, we want b((p, r) = 0 or 
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(sin ((2s + 1)0) e' T (\ - <*)g 0 bo) + cos ((2s + 1)0) ((1 - J+)b\ - l) = 0, 


or 

e iT d ~ ((2s + 1)0) = (1 - (1 - e^)(l - g 2 0 )) cos ((2s + 1)0), 


since bo = y 1 — g^. Since the right-hand side equals 
(go(l - e i4> ) + e 10 ) cos ((2s + 1)0), 


we want <p and r to satisfy 


cot ((2s + 1)0) = 


e^O-e^g^l-g 2 
go(l - e'#) + e'* 


(9.5) 


Once cp is chosen, we choose r to make the right-hand side real. To find <p- compute the 
magnitude squared of the right-hand side of equation 9.5 


_ gpboi 2--2cos0) _ 

go (2 — 2cos0) - £q( 2 — 2cos0) + 1 


The maximum value of the magnitude squared, obtained when cos 4> = — 1, is 

4g 0 2 fr 0 2 _ 4 gib 2 

4g 0 4 -4g 0 2 +l (2go — l) 2 ' 


So the maximum magnitude is 


2gofro 

2g 0 2 -l 


2gobo 

——= tan (20), 
So~ b o 


where sin 0 = ~Jt — go as before. Thus, (f> and r can be chosen to make the right-hand side of 
equation 9.5 any real number between [0, tan(20)]. By the geometric interpretation of section 
9.2.1, after s = [fg — iterations, the state has been rotated to within 20 of the desired state. 
Thus, we have shown that 4> and r can be chosen so that applying .v iterations of Q. followed by 
one application of Q(<p, r), yields a solution with certainty. 


9.5 Unknown Number of Solutions 


Grover’s algorithm requires that we know the relative number of solutions t — \G\/N in order 
to determine how many times we should apply the transformation Q. More generally, amplitude 
amplification requires as input the success probability t = |go| 2 of (/10). This section sketches 
two approaches to handling cases in which we do not know t. The first approach repeats Grover’s 
algorithm multiple times, choosing a random number of iterations of Q in each run. While 
inelegant, this approach does succeed in finding a solution with high probability. The second 
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approach, called quantum counting, uses the quantum Fourier transform to estimate t. Both 
approaches require 0(*J~N) calls to Up. 


9.5.1 Varying the Number of Iterations 

Consider Grover’s algorithm applied to a problem with tN solutions in a space of cardinality 
N. When t is unknown, a simple strategy is to repeatedly execute Grover’s algorithm with a 
number of iteration steps picked randomly between 0 and j\/N. For large values of t, this simple 
approach is clearly not optimal. Nevertheless, as we show, this simple strategy succeeds with at 
most 0(VN) calls to Up regardless of the value of t. 

The results of section 9.1.4 imply that the average probability of success for a run with i 
iterations of Q, where i is randomly chosen between 0 and r, is given by 


Pr(i < r) = 


sin 2 ((2/ + 1)0), 


where sin 9 = -Jt as before. A plot of the average success probability for different values of r is 
shown in figure 9.6. The graph will be identical for all values of t as long as t <<C 1. For comparison, 
the graph of the success probability after exactly r iteration steps of Grover’s algorithm is also 
given. 

It is easy to see from the graph of this function that there is a constant c such that Pr(i < r) > c 
for all r — J For j < N, guaranteeing at least one solution, if we choose r = then 

Pr(i < n/A\/~N) > c. Thus, a single run of the algorithm, where the number of iterations of Q 



Figure 9.6 

The average success probability Pr(i < r ) over runs with a random number of iterations chosen between 0 and r plotted 
as a function of r where sin 6 = as usual. For reference, the dotted curve gives the success probability for a run with 
exactly r iterations. 
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is chosen randomly between 0 and jz/A^/N, finds the solution with probability at least c. The 
expected number of calls to the oracle during such a run is therefore 0(-J~N). For any probability 
c' > c, there is a constant K such that if Grover’s algorithm is run K times, with the number of 
iterations for each run chosen as above, then a solution will be found with probability c'. Thus, 
for any c', the total number of times Q is applied, and therefore the total number of calls to the 
oracle, is 0(VN). 

9.5.2 Quantum Counting 

Instead of repeating Grover’s algorithm with randomly varying numbers of iterations of Q, quan¬ 
tum counting takes a more quantum approach: create a superposition of results for different 
numbers of applications of Q and then use the quantum Fourier transform on that superposition 
to obtain a good estimate for the relative number of solutions 1 . The same strategy can be used for 
the amplitude amplification algorithm to estimate the success probability t of t/|0). This approach 
also has query complexity 

The algorithm itself is easy to describe, though determining the size of the superposition needed 
is more involved. Let U and Q be as defined in the amplitude amplification algorithm of section 
9.2. Define a transformation RepeatQ, with input | k) and that performs k iterations of Q 
on \ij/): 

RepeatQ : \k) <g> |t/r) -> | k) <g> Q k \i/r). 

This transformation is more powerful than the classical ability to repeat Q because RepeatQ can 
be applied to a superposition. We apply RepeatQ to a superposition of all k < M — 2 m tensored 
with the state U |0) to obtain 

| M -1 | M— 1 

—j= X! I*) ® t/|0> -» —j= X! I*> ® iSkWo) + bk \f B )), 

VM ^ Vm f ^ 0 

where we ignore for the moment how M was chosen. 

A measurement of the right register in the standard basis produces a state |jt) that is either a 
good state (orthogonal to |i/'b)) or a bad state (orthogonal to IV^g))- Thus, the state of the left 
register collapses to either |i jr) — C Ylt= u* bk\k) or |i jr)' = C' gk\k). Let us suppose the 

former state |i jr) is obtained; the reasoning for the latter case is analogous. From section 9.2, 
bk = cos((2£+ 1)0), so 

M—\ 

W) =CJ2 cos((2k+l)6)\k). 
k—0 

Apply the quantum Fourier transform to this state to obtain 

M— 1 M— 1 

k =0 7=0 
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Section 7.8.1 explained that, for a cosine function of period most of the amplitude is in those 
Bj that are close the single value —. If we measure the state now, from the measured value | j } 
we obtain, with high probability, a good approximation of 0 by taking 6 — jj. Thus, with high 
probability, the value t = Vsin 0 is a good approximation for the ratio of solutions in the Grover’s 
algorithm case, or the success probability of U\0) in the case of amplitude amplification. 

There is, of course, one issue remaining: we do not know a priori a proper value for M. 
This problem can be addressed by repeating the algorithm for increasing M until a meaningful 
value for j is read. Since 6 — for a given 9 we will likely read an integer value j ~ — 
and j will be measured as 0 with high probability when M is chosen too small for the given 
problem. 

9.6 Practical Implications of Grover's Algorithm and Amplitude Amplification 

The introduction to this chapter mentioned that there is debate as to the extent of the practi¬ 
cal impact of Grover’s algorithm and its generalization, amplitude amplification. Although the 
quadratic reduction in the query complexity provided by Grover’s algorithm and amplitude ampli¬ 
fication over classical algorithms may seem minor compared to the superpolynomial speed up of 
Shor’s algorithm, a quadratic speedup can be of practical importance. For example, even though 
the fast Fourier transform is only a quadratic speedup over the straightforward way of implement¬ 
ing the Fourier transform, it is viewed as a significant improvement. That the speedup provided 
by Grover’s algorithm is no greater is the least of our concerns in terms of the practical impact of 
these algorithms. 

A major concern is the efficiency with which Up can be computed for a given practical problem. 
Unless Up is efficiently computable, the O (V~N) speedup of the search is swamped by the amount 
of time it takes to compute Up. If Up takes O (N) time to compute, which is true for a generic P , 
then a run of Grover’s algorithm takes O(N) time, even though it uses Up only O ( \/N) times. 
Furthermore, there is no savings for multiple searches over the same space; the measurement 
at the end of the algorithm destroys the superposition, so Up must be computed afresh for each 
search. 

Another concern is that most searches done in practice are over spaces with a lot of structure, 
which in many cases enables fast classical algorithms that amplitude amplification cannot improve 
upon. For example, classical algorithms can find an element of an alphabetical list of N elements 
in O (log 2 N) time. Furthermore, that algorithm is not of a form amenable to speedup by amplitude 
amplification. There are relatively few practical search problems for which the search space has 
no structure, so Grover’s algorithm on its own has few practical applications. Its generalization, 
amplitude amplification, is more widely applicable in that it can be used to speed up certain, but 
by no means most, classes of heuristics. 

Grover’s algorithm applies to exhaustive search of a search space, Grover’s algorithm is com¬ 
monly called a database search algorithm, but that appellation is misleading. Grover’s search 
algorithm gives a speedup only over classical algorithms for unstructured search. Databases, 
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which are generally highly structured, can be searched rapidly classically. Most databases, be 
they of employee records or experimental results, are structured and yet at the same time hard to 
compute from first principles (the relevant Up is expensive to compute); for example, an alpha¬ 
betical list of names is structured, yet computing it most likely cannot be done any faster than 
by separately adding each entry, an 0(N) time operation. For this reason, it is an unfortunate 
historical accident that Grover’s algorithm was ever called a database search algorithm. Contrary 
to popular claims, Grover’s algorithm will not aid standard database or Internet searches, since 
it takes longer to place the elements in a quantum superposition, which gets destroyed in each 
run of the search, than it takes to perform the classical search in the first place: re-creating the 
superposition is often linear in N which negates the 0(V~N), benefit of the search algorithm. In 
fact, Childs et al. showed that for ordered data, quantum computation can give no more than a 
constant factor improvement over optimal classical algorithms. 

When the candidate solutions to a problem can be enumerated easily and there is an efficient 
test for whether a given value x represents a solution or not, Up can be computed efficiently, thus 
avoiding that concern. The amplitude amplification technique used in Grover’s algorithm has been 
extended to provide small speedups for a number of problems, including approximating the mean 
of a sequence and other statistics finding collisions in r-to-1 functions, string matching, and path 
integration. NP-complete problems also fall into the class of problems for which the relevant Up 
for such problems are efficiently computable. Unfortunately, amplitude amplification gives only a 
quadratic speedup, so problems that require an exponential number of queries classically remain 
exponential for Grover’s algorithm. In particular, Grover’s search does not provide a means to 
solve NP-complete problems efficiently. Moreover, NP-complete problems have structure that is 
exploited in classical heuristic algorithms, and only some of these can be improved upon using 
amplitude amplification. 

9.7 References 

Grover’s search algorithm was first presented in [143]. Grover extended his algorithm to achieve 
quadratic speedup for other non-search problems, such as computing the mean and median of 
a function [144]. Using similar techniques, Grover has also shown that certain search problems 
that classically run in O (log N) can be solved in 0(1) on a quantum computer [ 143]. Amplitude 
amplification can be used as a subroutine in other quantum computations in light of a result 
of Biron et al. [52] that shows how amplitude amplification works with essentially any initial 
amplitude distribution while still maintaining O (V(V) complexity. 

Jozsa [166] provides a complementary description of the geometric interpretation of Grover’s 
algorithm and amplitude amplification. 

Bennett, Bernstein, Brassard, and Vazirani [41] give the earliest proof of optimality of Grover’s 
algorithm. Boyer et al. [58] provide a detailed analysis of the performance of Grover’s algorithm 
and give a solution to the recurrence relation of section 9.1.4. A tighter version of the optimality 
of Grover’s algorithm is given by Zalka [290]. 
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Boyer et al. [58] present the strategy for choosing a random number of iterations when t is 
unknown. They also present a more efficient algorithm for large t based on the same principle. The 
idea of quantum counting is due to Brassard et al. [62]. Their paper contains a detailed analysis 
that includes a strategy for iterating to find M. The two modifications of Grover’s algorithm that 
show that it is not inherently probabilistic are in this same paper. 

Childs et al. [81] showed that quantum computation can give no more than a constant factor 
improvement over optimal classical algorithms for searches of ordered data. Both Viamontes 
et al. [277] and Zalka [291] discuss issues related to practical use of Grover’s search algorithm 
and its generalizations. 

Extensions to Grover include approximating the mean of a sequence and other statistics [ 144, 
216], finding collisions in r-to-1 functions [61], string matching [234], and path integration 
[271]. Grover’s algorithm has been generalized to support nonbinary labelings [58], arbitrary 
initial conditions [52], and nested searches [77], 

9.8 Exercises 

Exercise 9.1 . Verify that 
gi = sin ((2/ + 1)0) 
b, — cos((2 i + 1)0), 

with sin 6 = +Jt = ^\G\/N , is a solution to the recurrence relations of section 9.1.4. 

Exercise 9.2. Show that applying Grover’s algorithm in the case t = \G\/N =1/2 results in no 
improvement. 

Exercise 9.3. What happens if we try to apply Grover’s algorithm to the case t — \G\/N — 3/4? 

Exercise 9.4. 

a. How many iterations should Grover’s algorithm use in order to find one item among sixteen? 

b. If we apply one fewer than the optimal number of iterations and then measure, how does the 
success probability compare to the optimal case? 

c. If we apply one more than the optimal number of iterations and then measure, how does the 
success probability compare to the optimal case? 

Exercise 9.5. Suppose P : [0. N — 1} —> [0, 1} is zero except at x — t, and suppose we are 

given not only a quantum oracle Up , but also the information that the solution t differs from a 
known string s in exactly k bits. Exhibit an algorithm that finds the solution with 0(\/2/) calls 
to Up. 

Exercise 9.6. Suppose P : [0,..., N — 1} —> [0, 1} is zero except at x — 1, and suppose we are 
given not only a quantum oracle Up, but also the information that all suffixes except 010 and 100 
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have been ruled out. In other words, the solution t must end with either 010 and 100. Exhibit an 
algorithm that finds the solution with fewer calls to Up than Grover’s algorithm. 

Exercise 9.7. Suppose P : { 0 . N — 1} -> {0, 1} is zero except at x — t, and suppose we are 

given not only a quantum oracle Up , but also the information that the solution t differs from a 
known string s in at most k bits. Exhibit an algorithm that is more efficient, in terms of the number 
of calls needed, than 0(\/2")? 

Exercise 9.8. Suppose P : {0,..., N — 1} -* {0, 1} is zero except at x = t, and suppose we are 
given a quantum oracle Up and told that with probability 0.9 the first n/2 bits of the solution t are 
zero. How can we take advantage of this information to obtain an algorithm that is more efficient, 
in terms of the number of calls needed, than 0(V2")? 

Exercise 9.9. Given a quantum black box for a function / : {0,..., N — 1} —>• {0,..., N — 1}, 
design a quantum algorithm that finds the minimum with O (\Tn log N) queries, where N = 2". 

Exercise 9.10. Suppose there is an error in the initial state, so that instead of starting with |00 ... 0) 
we run Grover’s algorithm, starting with the state 


1 

Vl +e 2 


(|00... 0} +e|ll... 1». 


How does this error affect the results of Grover’s algorithm? 

Exercise 9.11 . Why does applying amplitude amplification to the output of a first application of 
amplitude amplification not result in an additional square root reduction in the query complexity? 

Exercise 9.12. Prove the optimality of Grover’s algorithm in the multiple solution case. 

Exercise 9.13. For the quantum counting procedure of section 9.5.2, show how the estimate of t 
is obtained in the case that a bad state is measured. 
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Quantum Subsystems and Properties of Entangled States 


The power of quantum computation is most often attributed to entanglement. Indeed, Jozsa and 
Linden [167] have shown that any quantum algorithm that achieves exponential speedup over 
classical algorithms must make use of entanglement between a number of qubits that increases with 
the size of the input to the algorithm. Nevertheless, as section 13.9 explains, the exact role of entan¬ 
glement in quantum computation, and in quantum information processing more generally, remains 
unclear. While entanglement is generally viewed as an important resource for quantum compu¬ 
tation, entanglement, particularly multipartite entanglement, is still only poorly understood. 

This chapter surveys some of what is known about entanglement, particularly multipartite 
entanglement, entanglement between three or more subsystems. It also illustrates some of the 
complexities that make developing a deeper understanding of entanglement challenging. For 
example, there are many distinctly different types of multipartite entanglement; for entanglement 
between four (or more) subsystems, the different types of entanglement are uncountably infinite! 
Much work remains to be done to understand which types of entanglement are useful, and for what. 

As chapters 3 and 4 emphasize, the notion of entanglement is well defined only with respect 
to a particular tensor decomposition of the system into subsystems. A deeper understanding of 
entanglement comes from studying these quantum subsystems. Of particular interest will be 
entanglement between the n subsystems consisting of the single qubits making up a n qubit 
system and entanglement between registers of a quantum computer during a computation. Density 
operators, introduced in section 10.1, are used to model quantum systems, and more particularly 
how a state appears when only one subsystem is accessible. Density operators are also useful for 
describing measurements yet to be performed, or for which the outcome is not yet known. The 
mathematics of density operators enables a more detailed examination of entanglement, including 
issues related to quantifying how much entanglement a state contains and distinguishing different 
types of entanglement. 

Section 10.2 uses the density operator formalization to quantify bipartite entanglement and to 
examine properties of multipartite entanglement. How density operators model measurements 
is the concern of section 10.3, and section 10.4 discusses transformations of quantum systems. 
Properties of quantum subsystems give insight not only into entanglement, but also into robust¬ 
ness issues in quantum computation. From a practical standpoint, any quantum system such as 
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a quantum computer is always really a quantum subsystem: any experimental setup is never 
completely isolated from the rest of the universe, so any experiment is rightly viewed as only 
one part of a larger quantum system. The final section of this chapter, section 10.4.4, shows how 
decoherence, errors caused by interaction with the environment, can be modeled. This model 
forms the foundation for the discussion of quantum error correction and fault-tolerant quantum 
computation in chapters 11 and 12. 

10.1 Quantum Subsystems and Mixed States 

It commonly arises that we have access to, or interest in, only one part of a larger system. 
This section develops concepts and notation to support the study of quantum subsystems and 
entanglement between subsystems. 

Some of the issues in modeling quantum subspaces are illustrated by an EPR pair distributed 
between two parties. Imagine that Alice has the first qubit of an EPR pair (100} + 111}), and 
Bob has the second. How would Alice describe her qubit? It is not in a single-qubit quantum state, 
a quantum state of the form a|0> + b\ 1). Were Alice to measure her qubit in the standard basis, 
she would have a 50 percent chance of seeing |0) or 11). So it might appear that her qubit’s state 
must be an even superposition of |0) and 11). But if she measured it in the basis {|+>, |—}}, she 
would have a 50 percent chance of seeing |+) or |—}. In fact, in any basis whatsoever, it appears 
to be an even superposition of the two basis states. But no single-qubit state has this property. For 
example, the state -^=(|0> + |1}) is an even superposition in the standard basis but is an uneven 
superposition in most bases and is deterministic in the basis {|+}, |—)}. So what can Alice say 
about her qubit? 

To answer that question, it is worth looking carefully at what is meant by a state of a system. A 
state captures all information that could conceivably be learned about the system. Since informa¬ 
tion can only be gained by measurement, and measurement changes the quantum state, imagine 
an infinite supply of identically prepared quantum systems. The quantum state encapsulates all 
information that could be gained from any number of measurements on this infinite supply of 
identical quantum systems. 

Another way of saying that most states of a multiple-qubit quantum system cannot be described 
in terms of the states of each of its single-qubit subsystems separately is that a single qubit of 
a multiple-qubit system is generally not in a well-defined quantum state. Alice’s qubit of the 
entangled pair is such a case. An n -qubit quantum state captures all of the information that could 
conceivably be learned from measurements on an infinite supply of identically prepared quantum 
systems. For an infinite supply of m-qubit subsystems of identically prepared n-qubit systems, it 
is interesting to ask what can be learned from measurements of the /n-qubit subsystem alone. The 
structure that encapsulates that information is called the mixed state of the m -qubit subsystem, 
and it will be modeled by the mathematics of density operators. So far we have considered only 
systems that are universes unto themselves; the states of such systems, all the states we have 
studied so far, are called pure states. 
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The meaning of state in mixed state should be interpreted with care. That a subsystem always 
has a well-defined mixed state should not be interpreted to mean that when a state of a system 
is entangled with respect to a decomposition into subsystems, the states of the subsystems are 
well defined after all; mixed states are not quantum states in the conventional sense, the sense we 
have used up to now. Knowing the mixed states of all the subsystems making up a system does 
not enable us to know the state of the entire system; many different states of a system give the 
same set of mixed states on the subsystems. Knowing the mixed states of all the subsystems gives 
full knowledge of the state of the whole system precisely when the state of the entire system is 
unentangled with respect to that subspace decomposition. In exactly this case, the mixed states 
of the subsystems can be viewed as pure states. The relationship of a mixed state for a subsystem 
to the pure state of the whole system is analogous to the relationship of a marginal distribution to 
a joint distribution. This analogy can be made precise; see appendix A. 

The following section develops the mathematics of density operators for modeling mixed states. 
It concludes with a description of Alice’s qubit. 

10.1.1 Density Operators 

For an m -qubit subsystem A of a larger n -qubit system X — A ® B, the mixed state for subsystem 
A must capture all possible results of measurement by operators of the form O <g> /, where O 
is a measurement operator on just the m qubits of A and I is the identity on the n — m qubits 
of B. Let | jc) be a state of the entire n -qubit system. The next few paragraphs culminate in the 
description of an operator on the 2'"-dimensional complex vector space A, called the density 
operator p A : A -> A, that captures all of the information that can be gained about \x) from 
measurements on the m -qubit subsystem A alone. For this reason, density operators are used to 
model mixed states. 

Let M = 2"' and L — 2 n ~ m . Given bases {|ao}, ■. ■, Io’m-i}} and {|/3o>, ■.., |/6 l-i}} for A and 
B, respectively, {|a,} ® \Pj)} is a basis for X — A ® B. A state \x) of X can be written 

M— 1L—1 

w = X! ^2 x u\ a i)\Pj)- 

i=0 7=0 

Measurements on system A alone are modeled by observables 0 A with associated projectors 
{P ( A }, 0 <i< 2'”. On the whole space X, such measurements have the form 0 A ® I B with 
projectors P A ® I B . For any particular projector P A , (x\P A <g> I\x) gives the probability that 
measurement of \x) by 0 A <g> / 11 results in a state in the subspace associated to P '. Writing this 
probability in terms of the bases (Iffo), ■.. , \a.M-\)} and {|/3o>, ..., \Pl-\)} yields 
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where indices i and k are summed over [0... M — 1], and j and / are summed over [0 ... L — 1], 
Since (/3 ; j/3/) = 8/j, each term is zero except those for which j — I, so the probability that the 
measurement outcome is the one associated with P A can be written more concisely as 

(• x\P A ®I\x) = 'Y2xi]x kj (a i \P A \a k ). (10.1) 

ijk 

This formula, together with facts about the partial trace found in box 10.3, yields the density 
operator that encapsulates all information that can be gained by measurements of the form 
0 A <g> I B . Since {|a„}} is a basis for A, 

M -1 

|a«)(a u | = I 

u =0 


is the identity operator on A. We write 

(x\P A ®I\x) = ^ ~2xiJx kj (aj\P A \a k ) 


ijk 


= EE XijX k j{oti\P A I ^2 \ a u){oiu\) K> 

ik j \ u ! 

= EEE XijX kj {a u \a k ){ai\P A \a u ) 

u i k j 

= Y,(a u \ fry XjjXkj I oik) {<Xj I p I Ct u ) 


\ ik j 


= tr(p A P A ), 


where we define 

^EE XjjXkj\a k )(ai\ 
ik j 


( 10 . 2 ) 


and call p A the density operator for \x) on subsystem A. By box 10.3, 

p A =tr B {\x)(x\). (10.3) 

Since 0 A is a general observable on A, and P ' a general projector associated with 0 A , this 
calculation shows that all information from measurements on subsystem A alone can be gained 
from the density operator p A . Thus the density operator p A models the mixed state corresponding 
to the part of |jc) in A. 

This definition of a density operator of \x) is physically reasonable only if it does not depend 
on the choice of basis {|a ( }}, since physically no basis is preferred. The next two paragraphs show 
that density operators are well defined in the sense that calculating p A in different bases gives 
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Box 10.1 

The Trace of an Operator 


To define the trace of an operator O acting on a vector space V, we first define the trace of a matrix 
for O and then show that the trace is basis independent and therefore a property of an operator, not 
of the specific matrix representation used. The trace of a matrix M for O : V —>■ V is the sum of its 
diagonal elements: 

tr (M) = J2( v i\ M \ v i) 

i 

where {| i>;)) is the basis for V with respect to which the matrix M is written. The following identities 
are easily verified: 

tr(M[ + M2) = tr(Mi) + tr(M2), 
tr(ccM) = atr(M), 
trlMiAM = tr(M2Mj). 


The last equality implies tr(C _1 MC) = tr (M) for any invertible matrix C, which means that the 
trace is invariant under basis change. Thus the notion of a trace of an operator is independent of basis, 
so we can simply talk about tr(O) without specifying a basis. 

Useful fact: For any |i j/\) and 1 ^ 2 ) in a space V, and any operator O acting on V, 


Wil0lto> = tr(lto>WllO). (10-4) 

The proof of this fact illustrates a common way of reasoning about traces. For any basis {| a ;)} for V, 

tr(lto>Wil O) = £>;I^ 2 X^iI 0|«,-> 
i 

= ^r,{f\\o\cii)(oii\f2) 

i 

= mo [x)i««><oii) m- 


Since Wi )( a i I is the identity matrix, the result follows. 


the same operator. We first prove the result for density operators of pure states and then use that 
result to prove the general case. 

Suppose the subsystem under consideration is the whole system, A — X. The system is in a 
pure state |.r>, written as \x) — Yi -T'lV'';) in the basis {|t/r,->} for X. The density operator (equation 
10.2) becomes 

px = Px = = WW- 

ik 
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Box 10.2 

Restricting Operators to Subsystems 


Corresponding to any operator O ab on A ® B, there is a family of operators on subsystem A that is 
parametrized by pairs of elements from B. Any pair of states | b\) and \b 2 ) in B defines an operator 
on A denoted by (b\ | 0 A g\b 2 ). We hrst define the operator (b j | O ab \b 2 ) i n terms of a basis (|a,)} 
for A, and then show that it is independent of basis, so any basis for A defines the same operator. 
Operator (b\ \ 0 AB \b 2 ) acts as follows: 

{bl\0 AB \b 2 ):A —>■ A 

\x) b> ’^2{ai\(b l \OAB\x)\b 2 ) \a />. (10 ' 5) 


This notation takes some getting used to. It may help the reader to begin by writing the operator 
{bi\0 A B\b 2 ) as L,h\0 AB \_, b 2 ). 

To prove basis independence, let {| a'j) j be another basis for A with | a'j ) = Yli a ij \ a i ) • Then 

(bi\0 AB \b 2 )\a) = Y,( a 'j\(bl\°AB\b 2 )\a) I a'j) 
j 

= J2 (b\\0 AB \a)\b 2 ) y£2,a k j\ct k ) 

-EEE aijakj <«i I (b\ 1 0 AB |fl) \b 2 ) \a k ) 

i k j 

= J2^<^ b i\°AB\a)\b 2 ) [«,) 


where the last line follows because {la,-)) is a basis so ^ • ajjakj — &ik- These restricted operators 
are useful for defining the partial trace (box 10.3), the canonical restriction of 0 AB to subsystem A, 
and the operator sum decomposition discussed in section 10.4. 


Thus, the density operator of a pure state p* = |x)(x| is independent of the basis for X. As with 
any operator, a matrix representation for the operator does depend on the basis. In basis {|t/r,->}, 
the i j th entry of the matrix for p* is xjxj . The diagonal elements xjxi of the matrix have special 
meaning for measurements in the basis {| i/p)}: the probability that \x) will be measured as being 
in the basis state |i/p) with projector P, — |i/^,)(yV,j is 

(x\Pj\x) = {x\fi){fi\x) =XJXi. 

In the general case (A / A), let X — A ® B, and let {|a,}} and {|ySy)} be bases for A and B 
respectively. The matrix for the density operator pf of the state |x> = X U I) I fij) °f X in the 
basis {|ySy)} has entries Xjjxki: 
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M- 1 L -1 

XijXkl\<Xk)\Pl){<Xi\{Pj\- 

i,k=0 j,l 

To obtain the density matrix pf, we use equation 10.3, which says that pf is simply the partial 
trace over B of pf (see box 10.3): 


Box 10.3 

The Partial Trace 


For any operator Oab on A ® B, the partial trace of Oab with respect to subsystem B is an operator 
trg Oab on subsystem A defined by 

tr bOab = y^.(Pi\OAB\Pi), 

i 

where {|/?,•)} is a basis for B. The operators (/3,-1 OabI Pi) were defined in box 10.2. The partial trace 
trg Oab is basis independent by an argument similar to the one given for (f)\ | Oab l/b) i n box 10.2. 
In terms of bases {[<*,)} and {|/S;)} for A and B respectively, the matrix for trg Oab has entries 

M— 1 

(trgOAB);y = X ( a i\(Pk\°AB\° t j)\Pk)’ 

k =0 

so the matrix for trg Oab is 

N -1 /M-l \ 

tr bO A B= X I '£,( a i\(Pk\OAB\°‘j)\Pk}\\° l i}(°‘j\ 

1.2=0 \*=0 / 

where N and M are the dimensions of A and B respectively. In the special case in which Oab — 
|x)(x|, let XijXfci be the entries of Oab i n the basis |cr/)|yS/), so 

Oab = X^>l^>X»*l(ftl 
i j kl 

= ^2xijXki\ai)\Pj)(ak\{Pi\. 
i jkl 

Then 

N -1 M-l 

trg(0Afi) — trg(| jc)(jc|) = X X 
i,k =0 j =0 

In the special case in which an operator is the tensor product of operators on the separate subsystems, 
the partial trace has the simple form tr b(Oa ® 0g) = Oa tr(Og). 
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Px = tr fl(P.f) 


M— 1 L -1 


= ^EE 

\i,*=0 y,Z 


M—1 L -1 


( M—1 L— 1 

y, y x t - JCft; | Qffc> I y 3 / ) (or,|<> 3 ^- 1 J |a„}IA«} J |a„}(a„| 

i.k =0 j,l 


u,v=o y w 
M-l L—1 


— ^ ^ ' Xyui^uw\&u ) {®V I ■ 

U,V=0 W 

Since the partial trace is basis independent, so is the density operator. 


Example 10.1.1 Let us return to Alice, who controls the first qubit of the EPR pair = 
-L(|00} + |11}) while Bob controls the second. The density matrix for the pure state |i/r} e 
A <g> B is 

Pf = W){f\ 

= ^(|00>(00| +100)<11| +111>(00| +111>(11|) 



( 1 

0 

0 

1 \ 

1 

0 

0 

0 

0 

2 

0 

0 

0 

0 


l 1 

0 

0 

1 / 


The mixed state of Alice’s qubit, which encapsulates all information that could be obtained from 
any sequence of measurements on Alice’s qubit alone on a sequence of identical states |i j/), is 
modeled by the density matrix p ^ obtained from p^ by tracing over Bob’s qubit, p^ — tr B p^,. 
The four entries a oo, «oi, flio, and flu for a matrix representing pjjj in the standard basis can be 
computed separately: 

floo = y<0|(y| \ir)(f\ |0}|;>= y + o) = i 
j =0 ' ' 

1 

floi = y( 0 |( 7 | W)(f\ 11)17> = (0 + 0) = 0, 

7—0 
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l 


flio = 5Z<l|0'l \f)M 10)|y> = (0 + 0) = 0, 


7=0 

1 

flu = £< 1 K./I I^X^I I W) 

j =o 


So 

A 1 
P * = 2 




1 

2 ' 


By symmetry, the density operator for Bob’s qubit i 


P* = 


In general, it is not possible to recover the state of the entire system from the set of density 
operators for all of the subsystems; information has been lost. For example, for a two-qubit system, 

if the density matrices for each of the two qubits are p^ — ^ ^ ^ ^ ^ and \ ^ ^ ^ ’ 

the state of the two-qubit system as a whole could be (100) + 111)), as in example 10.1.1, or it 
could be -^(|00) — |11>) or -^(|01> + |10>) among other possibilities. 

10.1.2 Properties of Density Operators 

Any density operator pf satisfies 

1. pf is Hermitian (self-adjoint), 

2. tr(p^) = 1, and 

3. pj 4 is positive. 

Property (1) follows immediately from the definition (equation 10.2). Since \x) is a unit vector, 
tr(p^) = ^ZxpXij — 1. An operator O : V —* V being positive means that (v|0|t>) is real with 
(t>|0|u) > 0 for all |u) in V. To show that pj 4 : A -* A is positive, let |i>) 6 A. Then 

{v\Px\v) = 'Y^ j {v\{xl]x kj \a k ){ai\)\v) 
ik j 

= ^2^2xij(oii\v)x kj {v\a k ) 
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2 



^2xij{v\ai) 


> 0. 

Positivity implies that all of the eigenvalues of pf are real and non-negative: if X is an eigenvalue 
of pj 4 with eigenvector |v^), then X — {vx\p x \vx) is real and non-negative. It follows from these 
properties that in any (orthonormal) eigenbasis {|uo}, ..., \vm-\)} for p•£, the matrix for p j 4 is a 
diagonal matrix with non-negative real entries Xj that sum to 1. Thus p f — A.,-1 n,-} (v,j. In this 

way the mixed state with density operator p j 4 may be viewed as a mixture of pure states |n, )(v, | 
or, more precisely, as a probability distribution over these states. 

It turns out that any operator satisfying (1), (2), and (3) is a density operator; in some expositions, 
that is how density operators are first defined. To establish this equivalence, we need to show 
that, for any operator p : A -> A satisfying these conditions, there is a pure state \\j/) of a larger 
system A® B such that tr s (|V f }{V f l) = P- The state |i jr) is called a purification of p. Let p be 
any operator acting on a subsystem A of dimension M — 2"' that satisfies (1), (2), and (3). These 
properties mean that in its eigenbasis If/'i), ..., |(/ f M-i)}» P i s diagonal with non-negative 

real eigenvalues X, that sum to 1. Thus, for any p, 

P — ^•olV'oXV'ol H- V 

for some {|i/fo), IVt). • • •. Wm-\)\- Let B be a quantum system with associated vector space of 
dimension 2" > M, and let {10>,..., \M — 1)} be the first M elements of a (orthonormal) basis 
for B. Then the pure state \x) e A ® B 

\x) — V^o I V^o) 10) + V^|*1>|1> + • —^ Xm-iW m-i)\M — 1) 
satisfies p^ — p. 

For a pure state |x), the density operator pf = x) (x has a particularly simple form in terms 
of a basis that contains \x) as its / th element: it is a matrix with all entries 0 except for a single 1 
on the diagonal in the /th spot. It follows that the density operator of a pure state is a projector: 
p*p* — p*. Conversely, any density operator that is a projector corresponds to a pure state: 
projection operators have only 0 and 1 as eigenvalues, and to obtain trace 1 the density operator 
must have only a single 1-eigenvector, which is the corresponding pure state. 

Another nice property of density operators of pure states is that the non-uniqueness of the repre¬ 
sentation of states due to the global phase disappears. Let \x) = e'"\y). The density operator corre¬ 
sponding to \x) is p x = |jc) (jc|, which is also equal to |y}(y|, since p x = \x){x\ = e' e \y) (y\e~'° — 
|y)(y|. Thus, any two vectors that differ by a global phase have the same density operator. 

It is important not to confuse mixed states with superpositions. The mixed state that is an 
even probabilistic combination of |0) and 11) is not the same as the pure state superposition 
|+) = -L(|0> + |1». Their density operators are different: in the standard basis, the density 
matrix for the former is 
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Pm£ = Ko l)’ 


whereas the density matrix for the latter is 



The latter gives deterministic results when measured in an appropriate basis, whereas the former 
gives probabilistic results in all bases. 

Mixed states are not viewed as true quantum states, but rather as a way of describing a subsystem 
whose state is not well defined, being only a mixed state, or a probabilistic mixture of well defined 
pure states. Therefore, state or quantum state will mean a pure state unless it is prefaced with the 
word mixed. Furthermore, when it is clear which subsystem is being talked about, we drop the 
superscript and just say p x . 


10.1.3 The Geometry of Single-Qubit Mixed States 

The Bloch sphere (section 2.5.2) can be extended in an elegant way to include single-qubit mixed 
states. Mixed states are convex combinations of pure states, linear combinations of pure states 
with non-negative coefficients that sum to 1, so it is not surprising that single-qubit mixed states 
can be viewed as lying in the interior of the Bloch sphere. The precise connection with the 
geometry uses the fact that density operators are Hermitian (self-adjoint) operators with trace 1. 
Any self-adjoint 2 x 2-matrix is of the form 

( i a c — id \ 

c + id b J ’ 

where a, b, c, and d are real parameters. Requiring that the matrix have trace 1 means there are 
only 3 real parameters. Such matrices can be written as 

[ ( l + z x - iy \ 

2 \ x + iy 1 — z ) 

where x, y, and z are real parameters. Thus, any density matrix for a single-qubit system can be 
written as 

1 

-{I + xo x + y<r y + zo z ), 
where 

a x = X = (^ ^ * j, <J y = -i Y = ^ j q' ^ and a z — Z = ^ * -1 ) are the Pauli spin 

matrices. (The Pauli spin matrices are related to the Pauli group elements X, Y , and Z of section 
5.2.1 by a x = X, cr v — —i Y. and a z = Z.) 
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The determinant of a single-qubit density operator p — \{I + xo x + ya y + zcr z ) has geometric 
meaning; it is easily computed to be 

det(p) = i(l — r 2 ) 

where r — jr|“ + |_y |" + |z|" is the radial distance from the origin in x, y, z coordinates. Since 
the determinant of p is the product of its eigenvalues, which for a density operator must be 
non-negative, det(p) > 0. So 0 < r < 1. Thus, with x, y, and z acting as coordinates, the density 
matrices of single-qubit mixed states p = \ (I + xa x + yo y + zcr z ) all lie within a sphere of radius 
1. The density matrices for states on the boundary of the sphere have det(p) = 0; one of their 
eigenvalues must be 0. Since density operators have trace 1, the other eigenvalue must be 1. Thus, 
density operators on the boundary of the sphere are projectors, which means they are pure states. 
We have recovered the boundary of the Bloch sphere discussed in section 2.5 as the boundary 
of a ball for which the Pauli spin matrices provide the coordinates. This entire ball is called the 
Bloch sphere (though it ought to be called the Bloch ball). 

The following table gives the density matrices, in the standard basis, and the Bloch sphere 
coordinates for some familiar states and mixed states. 

(x. y, ^(coordinate state vector density matrix 


(1,0,0) 

l+> 

2(1 + a x) — 

(0, 1,0) 

|z> 

\(1 + <Xy) — 

(0,0, 1) 

10} 

\(I +o z ) = 

(0, 0, 0) 


A> = \l = \ 


1 

v 1 
' 1 

' 2 
0 

1 

0 



The set of all density operators for mixed states of an n -qubit system with n > 2 also forms 
a convex set, but its geometry is significantly more complicated than the simple Bloch sphere 
picture. As one example, in the single-qubit case the boundary of the Bloch sphere contains exactly 
the pure states, where as for n > 2, the boundary of the set of all mixed states contains both pure 
and mixed states. The reader may easily check that this statement must be true by computing the 
dimension of the space of n -qubit mixed states to be 2 2 " — 1 and comparing it to the dimension 
of the space of pure states, which is only 2" +1 — 2. 

10.1.4 Von Neumann Entropy 

The density matrix of one qubit of an EPR pair, 

Pm£ = Ko l)’ 
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corresponds to the point (0, 0, 0) in the center of the sphere, farthest from the boundary. In a 
technical sense, this state is the least pure single-qubit mixed state possible: it is the maximally 
uncertain state in that no matter in what basis it is measured, it gives the two possible answers 
with equal probability. In contrast, for any pure state, there is a basis in which measurement gives 
a deterministic result. For no state, mixed or pure, do measurements in two different bases give 
deterministic results, so pure states are as certain as possible. 

This notion of uncertainty can be quantified for general n -qubit states by an extension of the 
classical information theoretic notion of entropy. The von Neumann entropy of a mixed state with 
density operator p is defined to be 

Sip) = —tr(p log 2 p) = - ^2 + l«g 2 + - 

i 

where X, are theeigenvalues of p (withrepeats). As is done forclassical entropy, take 0 log(0) = 0. 

The von Neumann entropy is zero for pure states; since the density operator p x for a pure state 
|x) is a projector, it has a single 1-eigenvalue with n — 1 0-eigenvalues, so S ( p x ) = 0. Observe 
that the maximally uncertain single qubit mixed state pme has von Neumann entropy S(p) — 1. 
More generally, a maximally uncertain n -qubit state has a density operator that is diagonal with 
entries all 2 - "; a maximally uncertain u-qubit state p has von Neumann entropy S(p) = n. 

For a single-qubit state with density operator p, the von Neumann entropy S(p) is related 
to the distance between the point in the Bloch sphere corresponding to p and the center of 
the Bloch sphere. Let k\ and a 2 be the eigenvalues of p. Since density operators have trace 1, 
L 2 = 1 — Aj. The von Neumann entropy of p can be deduced from its determinant: det(p) = /. |/- 2 , 
so det(p) = A.j(l — A-i), so — V + det(p) = 0, which has solutions 


M = 

and 

1 2 


1 + VI — 4det p 


1 — V1 — 4 det p 


Using det(p) = i(l — r 2 ) from section 10.1.3, we see that 


and 


1 — r 


1 2 = 

z 

So, for single-qubit mixed states, the entropy is simply a function of the radial distance r: 


Sip) = - 


l+r 


log 2 


l + r 


+ 


1 — r 


log 2 


1 -r 


( 10 . 6 ) 
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10.2 Classifying Entangled States 

The concepts and notation for quantum subsystems support a deeper study of entanglement. Ever 
since the beginning of the field, entanglement has been recognized as a fundamental resource 
for quantum information processing and a key to what distinguishes it from classical processing. 
Nevertheless, it is still only poorly understood. Only in the simplest case, that of pure states of 
a bipartite system A® B, is entanglement well understood. What is known about multipartite 
entanglement is that it is complicated; there are many distinct types of multipartite entanglement 
whose utility and relation are only beginning to be understood. Even for bipartite mixed states there 
are distinct measures of entanglement. Each gives insight into the entanglement resources needed 
for various quantum information processing tasks; no single measure of entanglement will do. 

Recall that entanglement is not an absolute property of a quantum state, but depends on the 
tensor decomposition of the system into subsystems. A (pure) state | \[r) of a quantum system 
with associated vector space V, is separable with respect to the tensor decomposition V — 
V\ ® • • • (gi V n if it can be written as 

I f) — \lfrl)®---®\ fn)'* 

where |i/q) is contained in V t . Otherwise, \-f) is said to be entangled with respect to this decom¬ 
position. For n-qubit systems, we will generally speak of entanglement with respect to the 
decomposition into the n single-qubit systems. Thus, when we say that a state is entangled without 
further qualification, we mean that it is entangled with respect to this decomposition into individual 
qubits. 

For bipartite pure states, it is possible to quantify the amount of entanglement a state contains. 
Any reasonable measure of entanglement should satisfy certain properties. For example, any 
measure of entanglement should take its minimal value, usually zero, on unentangled states. 
Furthermore, performing any sequence of operations, including measurements, on the subsystems 
individually should not increase the value of an entanglement measure. Even allowing the result 
of a measurement on one subsystem to influence which operations are performed on another 
subsystem should not increase the value. Imagine different people in control of each subsystem, 
with only classical communication channels between them. The restricted set of operations they 
can perform is often abbreviated FOCC, for local operations with classical communication. The 
FOCC requirement for any reasonable measure of entanglement means that nothing these people 
can do can increase the value of the entanglement measure. 

10.2.1 Bipartite Quantum Systems 

To find a good measure of entanglement for pure states of a bipartite system X — A® B, let 
us look at the simplest of bipartite systems: two-qubit systems. The state |i/r} = —^ (|00) + 111)) 
is maximally entangled in the sense that, when looked at separately, the state of each qubit is as 
uncertain as possible. Tracing over each qubit gives the mixed state pme — hi- This state has 
maximal von Neumann entropy among all two-qubit states. Similarly, unentangled states are the 
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least entangled states possible in the sense that, when looked at separately, the state of each qubit 
is as certain as possible. Tracing over each qubit gives a pure state, a state with zero von Neumann 
entropy. These examples suggest that the von Neumann entropy of the partial trace with respect 
to one of the subsystems might make a good measure of entanglement in bipartite systems. 

In order for this approach to make sense, the von Neumann entropy of the partial trace should 
be the same whether we look at subsystem A or subsystem B. The proof that the two quantities 
are the same relies on the Schmidt decomposition. The Schmidt decomposition also leads directly 
to a coarse measure of entanglement. For any pure state |i jr) of a bipartite system A <g> B. there 
exist orthonormal sets of states {IV'/}} and {IV'/}} such that 

K 

1=1 

for some positive real X, such that £T =1 Xf = 1. Exercises 10.8 and 10.9 step through a proof 
that Schmidt decomposition exists for every state | x/t). The X, are called the Schmidt coefficients, 
and K, the number of A.,-, is called the Schmidt rank or Schmidt number of |i jr). For unentangled 
states, the Schmidt rank is 1. 

We now use theSchmidt decomposition to check that tr a p — tr gp. Let IV'} be a state in a bipar¬ 
tite system X — A ® B, where A and B are general multiple-qubit systems. Let p = |i/ r }{V' , l- Let 

tf-i 

IV'} = 

1=0 

be a Schmidt decomposition for | \jf). Then 

K- 1 K -1 

p = | V'XV'l = EE X t Xj 


tr bP = ^IV'/KV'/l 

1=0 

and 

K -1 

tr aP = 

1=0 

Since {|is an orthonormal set, it follows that 

K -1 

S(tr A p) = - E l°g 2 
1=0 
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Similarly, 

K -1 

SIX*bP) = - ^A,?log 2 A,?. 

1=0 

Thus 

S(te A p) = S(tr B p). 

The amount of entanglement between the two parts of a pure state |i//) of a bipartite system 
X — A® B with density operator p — \’4r)(ifr\ is defined to be 

S(tr A p), 

or equivalently S(tr B p). 

We compute this quantity for a variety of bipartite states. To begin with, it is zero on unentangled 
states. 


Example 10.2.1 For |x) = 4^(|00) + 111)), recall from example 10.1.1 that p\ — tr 2 |x){x| = 
p ME = f /. Thus, by the formula for the von Neumann entropy for single-qubit mixed 
states, equation 10.6, the amount of entanglement is S ( p ME ) = 1. If we work out the den¬ 
sity matrices for the first qubit of states 4=(|01) + |10>) and 4=(|00) — /111 >) we will find 
that these too are equal to p ME . Such states are among the maximally entangled two-qubit 
states. 


Example 10.2.2 Let I*} = -^|00> + ^|01} + 10} + -^| 11) with density operator p x = |x)(x|. 

To obtain the density operator p\ — tr 2 |x)(x|, trace over the second qubit. The four terms that 
make up matrix p\ in the standard basis are: 

^{0|(;||x){x||0}|;} |0>(0| = ^ |0)(0| = ||0><0|, 

E<°IO'llx)(xlli)l;> lo ><11 = + I0XU = ^I0>(1|. 

2=0 V ' 

£<l|< 7 l|x><x|| 0 >|;> 1 1>(0| = (TT + _L_L) 1 1)<0 | = T|1)(0|, 

2=0 v ' 
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So 


p\ = -|0><0| + -|i><0| + -|0><i| + -|i><i| 


1 

Too 


50 14 
14 50 


1 14 

= -(/+ — X) 

2 50 


corresponding to the point (14/50, 0, 0) in the Bloch sphere. To compute Sip'), we note that 
{|+}, I - )} is the eigenbasis of pi with eigenvalues || and so 


, 16 16 9 9 

S/pJ) =-logo-logo — = 0.942_ 

25 2 25 25 2 25 


More directly, we could have used equation 10.6 of section 10.1.4 to compute the eigenvalues 
from the distance r = ^ of p ' from the center of the sphere. 


Example 10.2.3 Let |y) = j^|00> + ^jplll) with density operator p y — |y) (y|. Tracing over the 
second qubit, we obtain 


P l (y) = tr 2 (p y ) = 


1 

Too 


1 o 

0 99 



49 

50 


Z) 


corresponding to the point (0, 0, 49/50) in the Bloch sphere. Using the relation between r = ^ 
and the eigenvalues given by equation 10.6 of section 10.1.3, we obtain 


S(p l y ) = - 


1 

Too 


log 2 


1 

Too 


99 

Too 


log 2 


99 

Too 


0.0807 


To underscore how strongly the notion of entanglement depends on the subsystem decompo¬ 
sition under consideration, we give an example of a state with widely different von Neumann 
entropies with respect to two different system decompositions. 


Example 10.2.4 The amount of entanglement in the four-qubit state 

W) = ^C|00> + |11) + 122} + |33» = 1(10000} + |0101} + 11010} + |iin» 

differs greatly for two different bipartite system decompositions. 

First, consider the decomposition into the first and third qubits and the second and fourth 
qubits. Trace over the first subsystem. The state p^ 4 = tri, 3 (|i/f}(i/f |) has von Neumann entropy 
0 since, by example 3.2.3, \ir) can be written as the tensor product of pure states in each of these 
subsystems. Thus, with respect to this decomposition, the state | ijr) is unentangled. 

Now consider the decomposition into the first and second qubits and the third and fourth qubits. 
Tracing over the second system yields 
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3 3 

pf = tr 34 (| i/r) (r/ f I) = EE 01W>WI*'>l*>ly><*'l- 

i.j =0 k =0 

The coefficient of |y}{i| is \&ij, so p^ 2 is the 4x4 diagonal matrix with diagonal entries all 1/4, 
so 5 , (tri,2(|V f }(V f l)) = 2. The maximum value possible for the von Neumann entropy of states 
of a two-qubit system is 2. Thus, with respect to this decomposition, the state |i jf) is maximally 
entangled. 


While the von Neumann entropy of the partial trace with respect to one of the subsystems is 
the most common measure of entanglement for bipartite pure states, the Schmidt rank K is also 
a useful measure of entanglement. Both are nonincreasing under local operations and classical 
communication (LOCC). The Schmidt rank is a much coarser measure of entanglement than 
the von Neumann entropy of the partial trace. For two-qubit systems, the Schmidt rank merely 
distinguishes between unentangled states, with Schmidt rank 1, and entangled states, with Schmidt 
rank 2. For bipartite systems A® B, where A and B are multiple-qubit systems, the Schmidt rank 
is more interesting than in the single-qubit case, but it is still a coarser measure than the von 
Neumann entropy of the partial trace. 

10.2.2 Classifying Bipartite Pure States up to LOCC Equivalence 

A state 1 1 jr) e X can be converted to a state \(p) e X by local operations and classical communica¬ 
tion (LOCC) with respect to a tensor decomposition X — Xi <B) ■ ■ • ® X n if there exists a sequence 
of unitary operators and measurements on separate Xi that when applied to i//} are guaranteed to 
result in the state \<p). Which transformations are applied is allowed to depend on the outcomes 
of previous measurements, but is otherwise deterministic. Two states i//) and \<p) are said to be 
LOCC equivalent with respect to the decomposition X — X\ ® • • • ® X n if \\jr) can be converted 
to | (p) via LOCC and vice versa. An unentangled state cannot be converted to an entangled one 
using only LOCC. 


Example 10.2.5 The Bell states -^(|00> + 111)) and -^(|01) + 110» are LOCC equivalent; 
simply apply X to the second qubit. 


Example 10.2.6 The Bell state -C(|00) + 111)) can be converted to |00) via LOCC, but not vice 
versa: measure the first qubit in the standard basis to obtain either |00) or 111), and if the result 
was 11} apply X to each of the qubits. 


Nielsen provides an elegant classification of pure states of bipartite systems up to LOCC 
equivalence, in terms of majorization of the sets of eigenvalues of the density operators of the 
subsystems. Let a — (a i, ..., a m ) and b = (b\,..., b m ) be two vectors in R" ! . Let a be the 
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reordered version of a such that a ; > «, + i for all i. We say that b majorizes a, written b > a, 
if for each k, 1 < k < m, Y^)=] a j — Y^j =i with the additional requirement that Y^j=i aj = 

i i'j when k = m. For a pure state \ jr) of a bipartite system A ® B, let X^ = (xf ,..., xf n ) 
be the eigenvalues of tr^ |i/r) (i/r |. Nielsen has proved that the state \ ij/) can be transformed to \<p) 
by LOCC if and only if X* is majorized by /.'''. Thus, |i/x) and \<p) are LOCC equivalent if and 
only if X ^ >; X$ and X$ > X'k. 

In the case of a bipartite system consisting of two qubits, the majorization condition reduces 
to a simple one. Let | \[r) and \<p) be two states of a two-qubit system with X'k = (X, 1 — X) and 
X& — (/x, 1 — /x), where X > 1/2 and /x > 1/2. Then X^ > X& if and only if X > /x. It follows 
that X'k > X* if and only if 5’(tr2|V f XV f I) < 5(tr 2 1</>) (01). Thus, \<p) can be converted to |i/r) via 
LOCC if and only if \<p) is more entangled than \ij/). Similarly \<p) and |i js) are LOCC equivalent 
if and only if the von Neumann entropies of the density operators for the partial trace over one of 
the subsystems are equal. Observe that there are infinitely many LOCC equivalence classes, and 
that these classes are parametrized by a continuous variable, 1 /2 < X < 1. 

For bipartite systems with subsystems larger than single-qubit systems, the classification is more 
complicated in that there are incomparable states. For example, if A and B are both two-qubit 
systems, the states 

IVX= ^|0>|0> + ^|l>|l> + ^|2)|2> + i|3>|3> 


and 

2V2 V6 1 1 

!</>>= — 10 >| 0 > + — 11 >| 1 > + -| 2 >| 2 ) + -| 3 >| 3 ) 


are incomparable because 


9 


1 


X\ — — > - = X, 
1 16 2 1 


but 

X++X+= — < —=< + 4. 
1 2 16 16 12 


Nevertheless, in any bipartite system, no matter how large, the vector for any unentangled state 
majorizes all others. Furthermore, in any bipartite system there are maximally entangled states 
| j/) for which X^ is majorized by X$ for all states \4>). Let X be a bipartite system X — A ® B 
where A and B have dimensions n and m respectively, with n > m. Let 1 //} he a state of the form 


\j/) 


X! ® i<4) 

/=i 


where the {| (pf)} and {|</>f>} are orthonormal sets, and since m is the dimension of B , the set 
{\(pB)} is a basis for B. The vector X^ is majorized by X$ for all states \(p) e A ® B. Furthermore, 
as one would expect, these maximally entangled states have maximal Schmidt rank, and the 
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von Neumann entropy after tracing over either subsystem is the maximum possible value. These 
states fulfill our current expectations for maximally entangled states in every way. We shall see, 
however, that for multipartite states it is nevertheless highly unclear what maximally entangled 
should mean. 

10.2.3 Quantifying Entanglement in Bipartite Mixed States 

Before discussing entanglement in multipartite quantum systems, we take a brief look at the 
meaning of entanglement for mixed states. A mixed state p of a quantum system Vi <g> • • • <g> V„ is 
separable with respect to this tensor decomposition if it can be written as a probabilistic mixture 
of unentangled states: p is separable if it can be written as 

m 

p = X! pmT^^T^ ® ® Wp) i. 

7=1 

where € V) and p, > 0 with pi — 1. For a given i, the various need not be 

orthogonal. If a mixed state p cannot be written as above, it is said to be entangled. 

This definition may appear more complicated than expected; why not say a mixed state 
p is entangled if it cannot be written as p\ <g> • • • <g> p„? The more involved definition distin¬ 
guishes entanglement from mere classical correlation. For example, the mixed state p cc — 
^100)(001 + ^111}(111 is classically correlated (it cannot be written as pi ® ® p„), but is 

not entangled. The state p<j,+ = 4(100) + |11))((00| + (11|) is entangled. Appendix A discusses 
quantum entanglement versus classical correlations in more detail. 

If a mixed state p can be written as a probabilistic mixture of entangled states, it is not necessarily 
entangled; it still may be separable. For example, consider 

p = i|o+)(4>+| + i|4>-)(a>-|, 

where |<E> + ) and |<t>”) are the Bell states |<J> + ) = l/\/2(|00) + 111)) and Id)”) = 1/V2(|00) — 

111)). We defined the mixed state p as a probabilistic mixture of maximally entangled states, but 
it is easy to check that it can also be written as 

p = ||00>(00| + ||11>(11|, 

a probabilistic mixture of product states, so p is actually separable. 

There are a number of useful measures of entanglement for mixed bipartite states, all of which 
coincide with the standard measure of entanglement on pure states, the von Neumann entropy 
of the density operator for one of the subsystems. We give a rough description of a few of these 
measures. The amount of distillable entanglement contained in a mixed state p is the asymptotic 
ratio m/n of the maximum number m of maximally entangled states p ME that can be obtained 
from n copies of p by LOCC. Conversely, the entanglement cost is the asymptotic ratio m/n of 
the minimum number n of copies of a maximally entangled state p ME needed to produce m copies 
of p using only LOCC. The relative entropy of entanglement can be thought of as measuring how 
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close p is to a separable state ps, and is defined to be 

inf tr[p(logp-logp 5 )], 

PseS 

where the infinum is over all separable states p$. It is known that not only is the distillable 
entanglement never greater than the cost of entanglement, but also for most mixed states the 
distillable entanglement is strictly less than the cost of entanglement. In particular, there exist 
bound entangled states from which no entanglement can be distilled, but whose entanglement 
cost is non-zero. The study of entanglement in mixed bipartite states is a rich area of research, 
with many known results not described here, but also with many remaining open questions. Even 
the relationship between the measures we just described is not fully understood. 

10.2.4 Multipartite Entanglement 

Researchers are continuing to develop new measures of entanglement and explore properties 
of states entangled with respect to tensor decompositions into more than two subsystems. For 
quantum computation, we are particularly interested in properties of n -qubit states for large n, 
and measures of entanglement for these states with respect to the decomposition into the indi¬ 
vidual qubit systems. In spite of broad recognition that understanding multipartite entanglement 
is crucial for understanding the power and limitations of quantum computation, much remains 
unknown. Entangled states provide a fundamental resource for other types of quantum informa¬ 
tion processing, such teleportation and dense coding, as well as quantum computation. Which 
types of entangled states are most useful for which types of quantum information processing 
tasks, is an active area of research. 

Even for pure states of the simplest multipartite systems, three-qubit systems, quantifying entan¬ 
glement is complicated. We just saw that for two-qubit systems there are infinitely many LOCC 
equivalence classes. However, relaxing the LOCC condition simplifies the picture somewhat. 
A state \i{r) can be converted to \<p) by stochastic local operations and classical communica¬ 
tion (SLOCC) if there is a sequence of local operations with classical communications that with 
non-zero probability turns |</>) into |i fr). States \\fr) and \<p) are SLOCC equivalent if |i fr) can 
be converted to \<p) by SLOCC and vice versa. Under SLOCC equivalence, the two-qubit case 
reduces to two classes: entangled states and unentangled states. 

For a three-qubit system X = A ® B ® C, the SLOCC classification of states with respect to 
the decomposition into the three systems, has six distinct SLOCC classes: 

• unentangled states, 

• A-BC decomposable states, 

• B-AC decomposable states, 

• C-AB decomposable states, 

• SLOCC equivalent to |G//Z 3 > = -^(|000) + |111)), and 

• SLOCC equivalent to | W 3 > = ^(|001)+ |010> + 1100». 
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GHZy equivalent | W 3 ) equivalent 



A-BC B-CA C-AB 



unentangled 


Figure 10.1 

Partial order of SLOCC classes for a three-qubit system, where states in upper classes can be converted to states in lower 
classes using SLOCC, but the two uppermost classes cannot be converted to each other. 


A state |t/r) being in the A-BC decomposable class means that it can be written as \x[r) = \ir A ) ® 
\'4'bc) f° r IVot) C A and \ir BC ) e B <g> C, but it cannot be fully decomposed into |t/r} = | \fr A ) ® 
|iAb> ® Wc), where \ f A ) e A, |t (r B ) e B and |i/r c > e C. 

A partial order on these six classes is shown in figure 10.1: a state |i jf) is contained in a class 
above that of state \<f>) if there exists anSLOCC sequence taking |i/r} to \<j>) but not vice versa. There 
are two inequivalent classes of states at the top of the hierarchy; \GHZ 3 ) cannot be converted to 
| W 3 ) or the other way around. It is not clear whether G//Z 3 ) or | W 3 ) should be considered more 
entangled; each appears to be highly entangled in some ways and less so in others. To illustrate 
the distinct types of entanglement these states embody, we look at these states in terms of the 
persistency and connectedness of their entanglement. 

Persistency of entanglement The persistency of entanglement of \ijf) e V <g> • • • <g> V is the min¬ 
imum number of qubits, P e , that need to be measured to guarantee that the resulting state is 
unentangled. 


Maximal connectedness A state |i Jr) e V ® • • • ® V is maximally connected if for any two qubits 
there exists a sequence of single-qubit measurements on the other qubits that when performed 
guarantee that the two qubits end up in a maximally entangled state. 

Let | GHZ n ) be the n -qubit state 


I GHZ,,) 


1 


(I00...0) + 111...1)), 


and | W n ) be the n -qubit state 

I W n ) = -J=(|0 ... 001> + |0... 010> + |0... 100) H- b|l... 000)). 

v« 

Because only one qubit needs to be measured to reduce | GHZ„) to an unentangled state, the 
persistency of entanglement of | GHZ„) is only 1, so in this sense it is not very entangled. On 
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the other hand, | GHZ,,) is maximally connected. It is relatively easy to check that the states 
| W„) are not maximally connected. Yet, they do have high persistency: P e (\W n )) =n— l.Thus, 
whether \GHZ n ) or W„) should be considered more entangled depends on what properties of 
entanglement one is interested in. 

For n > 4, the situation becomes far more complicated. For n > 4 there are infinitely many 
SLOCC equivalence classes , and these classes are parametrized by continuous parameters. As n 
increases, it becomes less and less clear which states should be considered maximally entangled. 

Cluster States A class of n -qubit entangled states, cluster states, combine properties of both 
| GHZ n ) and W„) states. The \GHZ n ) states are maximally connected but have persistency of 
only 1. The persistency of the | W n } states increases with n , but they are not maximally connected. 
Cluster states are maximally connected and have persistency increasing with n. Cluster states 
form a universal entanglement resource for quantum computation that is the basis for cluster 
state, or one-way, quantum computing, an alternative model of quantum computing discussed in 
chapter 13. 

Let G be any finite graph whose vertices are qubits. The neighborhood of any vertex v e G, 
nbhd(v) is the set of vertices w connected to v by an edge of the graph. An operator O stabilizes 
astate |i jr) if 0\\/f) — \f) .The graph states |G> corresponding to a graph G is the state stabilized 
by the set of operators, one for each vertex of G, 

X v <g> (g) Z\ (10.7) 

i€.nbhd(v) 

where X = 11) (01 + 10) (11 and Z = |0)(0| — 11) (11 are the familiar Pauli operators, and the super¬ 
script on these operators indicates to which qubit the operator is applied. If the graph G is a 
^-dimensional rectangular lattice, then |G) is called a cluster state (see figure 10.2). There is 
some discrepancy in terminology in the literature; sometimes cluster states is taken to be synony¬ 
mous with graph states. Graph states, including cluster states, can be constructed as follows. For 
each vertex, begin with a qubit in state |+). Then for each edge in the graph apply the controlled 
phase operator Cp = |00) (00| + |01)(01| + |10)(10| — 111)(111. Since the controlled phase oper¬ 
ator is symmetric on the qubits, and the applications of the controlled phase all commute with 
each other, it does not matter in which order the operators are applied. Here we consider only 
the states stabilized by the operators X v <g> X 1 , but some expositions consider all states 

which are joint eigenstates of these operators. 


Example 10.2.7 Construction of the cluster state for a 1x2 lattice. Apply Cp to |+)|+) to obtain 
the cluster state 

102) = J(|00> + |01> + |10>-|11» = -U+>|0> + |->|1» = -4(|0>|T> + |l>|-». 

2 V2 V2 

This state is LOCC equivalent to a Bell state. 
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Figure 10.2 

A 4 x 5 rectangular lattice. Sample operators that define a cluster state are shown, one for an internal node, one for a node 
on the boundary but not at a comer, and one for a comer node. A cluster state is a state that is a simultaneous eigenstate 
of all such operators, one for each node of the lattice. 


Example 10.2.8 Cluster state for a 1 x 3 lattice. The operator (Cp <8>/)(/ <8> Cp) applied to 
l+}|+}|+} results in the cluster state 

|0 3 > = (Cp®/)(-U+>(|0>|+> + |1>|-») 

V2 

= ^(|0}|0}|+> + |0)|1}|—> + |l>|0>|+> — |1)|1}|—>) 

= ^(l+)|0>|+> + |->|l}|-». 

This state is LOCC equivalent to \GHZf). 


Example 10.2.9 Cluster state for a 1x4 lattice. 

I04> = ^(|0>|+>|0>|+> + |l>|->|0>|+> + |0>|-)|l)|-)T|l)|T>|l>|-» 
= ^(I+>|0>|+>|0> + |->|0>|->|0> + |+>|0>|->|1> + |->|1>|+>|1» 






10.3 Density Operator Formalism for Measurement 


229 


The reader should check that each of these states is stabilized by all of the operators of equation 
10.7. 

Briegel and Raussendorf give a straightforward proof that all cluster states are maximally con¬ 
nected. A more involved argument shows that cluster states \<p n ) have persistency \_n/2\. Thus, 
while the persistency of |</>„} is not as great as the persistency of | W„), the persistency of clus¬ 
ter states does increase linearly with the number of qubits, and unlike | W„), cluster states are 
maximally connected. Thus cluster states combine entanglement strengths of both the \GHZ n ) 
states and the | W„) states. In section 13.4.1, we briefly return to cluster states to describe the use 
of their entanglement as a quantum computational resource. The following table summarizes the 
situation: 



max connected 

Pe 

\GHZ n ) 

YES 

1 

l<M 

YES 

L«/2J 

1 w„) 

NO 

n — 1 


10.3 Density Operator Formalism for Measurement 

An analysis of a quantum algorithm or protocol that involves measurement must take into account 
all possible outcomes of any measurement. Up to now we have had only an awkward way of 
describing the result of a future measurement: listing the possible outcomes and their respective 
probabilities. Density operators provide a compact and elegant way to model the probabilistic 
outcomes of a measurement yet to be performed or of a measurement for which the outcome is 
unknown. 

Density operators provide a means of compactly expressing a probability distribution over 
quantum states or the statistical properties of an ensemble of quantum states. If the reader has 
not yet read appendix A on the relations between probability theory and quantum mechan¬ 
ics, now would be a good time to do so. The following game motivates this use of density 
operators in this context. Keep in mind the definition of a state given in section 10.1: a state 
encapsulates “all information about the system that can be gained from any number of mea¬ 
surements on a supply of identical quantum systems." Suppose you are told you will be sent a 
sequence of qubits and that either all members of the sequence are the first qubit of a Bell state 
-4 (100 } + 111 )) or that a random sequence of 10 ) and 11 > are sent, with 10 ) and 11 ) having equal 
probability. Your job is to determine which type of sequence you are receiving. What is your 
strategy? 

It is impossible to do better than guessing randomly; without access to more information, there 
is no way to distinguish the two sequences. If you were given access to the second qubit of each 
Bell pair in the first case and a second copy of each qubit in the second, a winning strategy is 
possible. But without access to the second qubit, the two sequences are indistinguishable. To see 
why, recall from section 10.1.1 that the density operator for one qubit of a Bell pair is ^ /, and that 
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the density operators for |0> and 11) are |0> (0| = 


1 0 
0 0 


and 11> (11 = 


0 
1 J 


respectively. 


From appendix A, the density operator p for a 50-50 probability distribution over the states |0> 
and 11 > is 


? 2 


1 0 
0 0 


= || 0 )< 0 | + || 1 >< 1 | 

1 

= -/. 

2 


0 0 \ 

0 1 ) 


So the density operator of one qubit of a Bell pair is the same as the mixed state of a 50-50 
probability distribution over the states |0) and |1>. 

More generally, a probability distribution over quantum states where \tJ/,) has probability is 
represented by the density operator 

k-i 

p = X! 

i=0 

This representation works even if the states |i/r,-> are not mutually orthogonal. Probability dis¬ 
tributions over quantum states have appeared frequently in this book to describe the possible 
outcomes of a measurement. Density operators provide a concise representation, one that can be 
manipulated directly to see the effects of subsequent unitary transformations and measurements. 
Given an orthogonal set {|x,}) of the possible outcomes of a measurement of a specific state |x>, 
with pi being the probability of each outcome, the density operator representing this probability 
distribution over quantum states is 

P = X! 

It is easy to check that p is Hermitian, trace 1, and positive, so p is a density operator. The density 
operator p = Pi\ x i)( x i\ summarizes the possible results of a measurement as a probabilistic 
mixture of the density operators for the possible resulting pure states weighted by the probability 
of the outcomes. 


10.3.1 Measurement of Density Operators 

This section discusses the meaning of and notation for measurement of density operators. The 
measurement of mixed states directly generalizes that of pure states. First, we write the familiar 
measurement of pure states in terms of density operators. Let |x) be an element of a N = 2" 
dimensional vector space X with corresponding density operator p x — |x)(x|. Measuring |x) 
with an operator O that has K associated projectors Pj, yields with probability pj = (x| Pj\x) 
the state 
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P M) 

\PjW\ 


1 

Vp] 


p M). 


The density operator for each of these states is 

pi = — PAxmpj - —p jPx pj, 

Pi Pi 

so the density operator p° summarizing the possible outcomes of the measurement is 

Px = £ Pi Pi = £ PjPxP] • 
j j 

When p x is written in an eigenbasis for the measuring operator O, the result p° of measurement 

Plx) 

with O is particularly easy to see. Let {|or £ ->} be an eigenbasis for O that contains the vectors |p J | r)| 
as the first K elements of an /V-element basis. In this basis. 


K ~ 1 p.|r\ 

1=0 


7=0 I W 


where x, — JpTj for i < K, and x, — 0 for i > K. So 


/N -1 


IN-1 


Px = \x){x\ = ( £x;k) £-*/K> = ££ XiXj\cti)(aj\; 


k i=0 


J=° 


J i 


the ij tn entry of the matrix for p x in basis {|a*)} is XiXj. The density operator p* is 

Px = £*/*/ a i^ a i = £ Pj\x){x\p], 


so p° is obtained from p x by removing all the cross terms; the matrix for p° in the basis {| or,-)} 
is the matrix p x with the off-diagonal entries replaced with zeros. 

Measurement of mixed states is easily derived from that of pure states. Let p be a den¬ 
sity operator. Using results from section 10.1.1, p can be viewed as a probabilistic mixture 
p — JL q, | 1 // 7 > {t/t,- | of of pure states 1 1 fr/). Measuring the mixed state p can be viewed as measur¬ 
ing |i/'';)(V'i'l with probability q so the measurement outcomes are encapsulated by the density 
operator p', a probabilistic mixture of the density operators representing the possible outcomes 
of measuring each | t/f, ){i/r,|: 

p' i = Y J p i\^Wi\p]- 

j 


Thus, the density operator p' for the possible outcomes of measuring the mixed state p is 


= £<?.£ r,mmp) = £ Pj £^><^1 ) p ] = £ p jP p ] 
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The term PjpP'j is not a density operator in general; it is positive and Hermitian, but its trace may 
be less than one. Because the trace of a positive, Hermitian operator is zero only if the operator is 


the zero operator, p ' may be viewed as a probabilistic mixture of density operators pj 
with weighting pj — tr (Pjppj): 


p >r p ] 

tr (PjppJ) 


p =J 2 pj pj = J 2 pj 

j j 


p jp p ] 

tr( PjpP]) 


where we ignore the zero terms. For a pure state | \[r) with density operator p — \ ) (i/s \, 


t _ } PjPPj 

9 ~ ^ Pi {if\Pj\f) 


because 


tr(Pj\1r)W\p]) - W\P]PjW) = {f\Pj\t) 

by the trace trick of box 10.1 and the properties of projection operators. 

Both measurements with known outcome and measurements that have yet to be performed or 
for which the outcome is not known can be concisely represented by density operators. Suppose we 
measure \x) with operator O and obtain outcome 1 1 jr) = |p ; | f)| with density operator p — \ijr){\jf \. 
There are two different representations for the result of this measurement, p^ and pj 9 . Which 
should we use? If we do not know the measurement outcome, we must use pj 9 . If we do know 
the outcome, we should use p,i,. We can use the density operator p A °, but p,/, encapsulates more 
of the information we know. If we were to use p i °, the outcome of the measurement must be 
kept track of separately, and since p f allows for more possibilities, using it means performing 
unnecessary calculations involving possibilities that did not happen. The same distinction arises 
when sampling from a probability distribution; before the sample is taken, or if the sample is taken 
but the outcome is unknown, the best model for the sample is the probability distribution itself. But 
once the outcome is known, the sample is best modeled by the known value. Appendix A discusses 
such relations between the classical and quantum situations. While issues with measurement 
connect with the deepest issues in quantum mechanics, the distinction between these two models 
for measurement outcomes is not one of these issues. The deeper questions involve when and how 
a measurement outcome becomes known and by whom. We do not elaborate on these quantum 
mechanical issues here. 


10.4 Transformations of Quantum Subsystems and Decoherence 

Density operators were introduced to enable us to better discuss quantum subsystems. In the 
preceding section, we used it fruitfully to gain insight into entanglement. So far we have only 
used it to discuss static situations. We turn now to dynamics. In the first two parts of the book, we 
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discussed quantum systems modeled by pure states that are acted upon by unitary operators. As 
we saw in section 10 . 1 , to discuss quantum subsystems, we needed to expand from considering 
only pure states to considering density operators. Similarly, to discuss the dynamics of quantum 
subsystems, we need to expand from considering only unitary operators to a more general class 
of operators. Section 10.4.1 develops superoperators , this more general class of operators, by 
considering unitary operators on the entire system and looking to see what can be understood 
about their effect on a subsystem. Section 10.4.2 describes a decomposition that gives insight into 
superoperators. Section 10.4.3 discusses superoperators corresponding to measurements. Section 
10.4.4 makes use of the superoperator formalism to discuss decoherence, errors caused by the 
interaction of the quantum system under consideration with the environment. This discussion of 
decoherence provides the setting for the discussion of quantum error correction in chapter 11 . 

10.4.1 Superoperators 

This section considers the dynamics of subsystems. Section 10.1 first considers the case in which 
the subsystem A is the whole system (A — X ), and then considers the general case. Here, first 
consider a unitary operator acting on a system X. In the original notation for pure states, the 
unitary operator U applied to X takes | x/r) to f/|i/r>. The density operator for a pure state |i jr) is 
p = |i/r)(i/f |, so U takes p to f7|i/ f }(V f \U^ — UpU''. The general case, in which A is a subsystem 
of X — A® B, is more complicated. Suppose \ijr) e X — A® B and U : X —> X. Then the 
density operator p A — tr B |i/r)(i//| is sent to p A — tr B {U\^/){\jf\U^). When U — U A ® U B , p A 
can be deduced from just p A and U , and it will be p’ A — U A p A U A . For a general unitary operator 
U, however, it is not possible to deduce p’ A from only U and p A \ the density operator p’ A depends 
on the original state \ij/) of the whole system. Two examples illustrate this point. 


Example 10.4.1 Let X = A ® B, where A and B are both single-qubit systems. Suppose p A — 
|0> (0|, and U — C not where B is the control qubit and A the target: 

U = |00){00| + |11}(01| + |10}{10| + |01}{11|. 

The density operator p A for subsystem A is consistent with many possible states of the entire 
system X , including |t/^o) = |00), l^i} = |01>, and 1 1 ^ 2 } = ^|0>(|0> + |1>). What is the density 
operator p’ A for system A after U has been applied? If the state of the entire system is | i/'o) = |00), 
then p’ A = |0}(0|. But if it were |i/q) = |01), then p’ A = 11) (11, or if it were 1 ^ 2 } = -4= |0> C|0> + 
|1», then p’ A = \l. 


In fact, the resulting mixed state p’ A may have no relation with the initial mixed state p A . 


Example 10.4.2 Consider the unitary operator 

Us w uch = | 00 >( 00 | + | 10 >( 01 | + | 01 }( 10 | + | 11 )( 11 | 
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acting on single-qubit systems A and B. The transformation exchanges the states of the two 
systems. Suppose system A is originally in state p A = IVOW and system B is in state |0)(0|. 
After applying U, the resulting state of system A is |0> (0| no matter what |i/r) is. 


Let V A be the set of all density operators for subsystem A. When initially subsystem A is not 
entangled with subsystem B , and subsystem B is in state | <p B ), a unitary operator U : X -> X 
induces a transformation Sf B : V A —> T> A . Specifically, the unitary transformation 

U : X —>■ X 

| xfr) U\f) 

induces 

4 k -V a ^V a 
Pa P' A , 

where pa = tr/j|i/r){i/r| andp^ = tr# 1Induced transformations suchasS^ 5 are called 

superoperators. 

Superoperators are linear: the effect of a superoperator S on any density operator p that is a 
probabilistic mixture of other density operators, p = JT p, p,, is the sum of the superoperator 
applied to each of the components: 

S - P EA%). 

i 

10.4.2 Operator Sum Decomposition 

Given a superoperator S : D A —> D A , it would be handy to describe it just in terms of 
system A and formalisms we already have for operators on A. General superoperators, how¬ 
ever, are not of the form UpU f for some unitary operator. They are not even reversible 
in general: from example 10.4.2, for U — Uswitch and \<p) = |0>, Sfj takes p A = |i/ r }(V f l to 
p' A = |0)(0| for all \ifr). Furthermore, most superoperators are not even of the form ApA 1 for 
some linear operator A. However, it turns out that every superoperator is the sum of oper¬ 
ators of this form; for every superoperator S, there exist linear operators Ai,..., A K such 
that 


K 

Sip) = ^2AjpAj. 

i=i 

Such a representation is known as the operator sum decomposition for S. The operator sum 
decomposition for a given superoperator S is not, in general, unique. 

To obtain an operator sum decomposition for sfj, let {| [ J > 1 )} be a basis for B and let A,- : A — > A 
be the operator A, = (Pi\U\<p) defined in equation 10.5 of box 10.2. Then 
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sf,(p) = tr B (U (p ® |0>(0|)i/ t ) 

K 

= ^](A|t/(p®l0>(0l)[/ t IA> 

i =1 
K 

= ^(A|t/|0>P<0|c/ + IA) 

;=i 

K 

= J2 AipA r 

i =1 

To see how the third line follows from the second, first consider the pure state case p = |r/r}(r/r |, 
from which the general case p, a mixture of pure states, follows. 

For a given superoperator there are many possible operator sum decompositions; the operator 
sum decomposition depends on which basis is used. The next two examples give the operator 
sum decomposition in the standard basis for the operators of examples 10.4.1 and 10.4.2. 


Example 10.4.3 Operator sum decompositionfor C not and\<f>) — (|0) + 11). The C not operator 

U of example 10.4.1 can be written U = X ® 11}(1| + / ® |0)(0|. Suppose that initially the two 
systems are unentangled and system A is in state p' = |i/ r }(V f I and B is in state p — \<f>)(<j)\. 

^(p) = tr B (l/(p®|0>(0|)t/ t ) 

— ^oP^o + ^iP^J 

where Ao = (0\U\4>) and Ai = (l\U\<p). Then, using the definition of A,- found in equation 10.5 
of box 10.2, 

l 

A 0 |iA> = X>,|(O|t/|tfr)| 0 > | a t ) 

1=0 

= (0|(0|(X® |1>(1| + / ® |O>(O|)|Vr>|0> |0> 

+ (l|(0|(x<8) |1>(1| + /(8) |O>(O|)|i/r>|0> |1> 

= ((0|(0|(Z® |l><l|)|i/r>|0> + <01(01(7® |0>(O|)|iA>l0> )|0> 

+ «i|<OKX®|i)<i|)iVr)| 0 ) + < 1 |< 0 |(/® 10X01)1^)1^ )|1). 

Because (0| 1} = 0, the first and third terms are zero, so 

Ao|VX = (O|<O|/®|OXO|)|i/r>| 0 ) |0) + (11<0|/ ® |0>(0|)|l/r> 10) |1) 


= (O|Vf>(O|0>|O> + (1 |tA><O|0)|1> 

= mm. 
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Since \<j>) — ^(|0> + 11>), 


A 0 \f) 


1 

V! 


\f)< 



Similar reasoning shows that 


A 1 |rA> = (O|(l|Z®|l>{l|)|t/r>|0> |O) + <1|(1|X®|1><1|)W|0> |1> 

= (0\X\f)(l\<fi) |O> + (1|X|^>(1|0> |1) 


1 


mu 


i 

Ai = -=X. 

V2 


Example 10.4.4 Operator sum decomposition for Us w itch and If) = |0). Let 

Us«,itch = | 00 >( 00 | + 110 ) ( 01 | + 101 >< 101 + | 11 )( 11 | 

and | <p) — |0>. 

S*(p) = tr B (Up® \</>}{</>\U T ) 

= AqpAq + A\pA j 

where Aq = (O|C/|0) and A\ = (l|t/|0). 

l 

A 0 \f) = Y^( a i\(°\U\f)\f) l«/> 

i=0 

= m\f)\f) | 0 > +( 10110 ) 1 ^ | 1 ) 

= <O|0>(O|t/r>(O| + <l|0><l|Vr)(l|- 
Since \<p) = |0>, 

Ao|VD = (0|^>|0> 


and 
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A 0 = |0)<0|. 

Similar reasoning gives 

Ai = |0>(1|. 


Each term A,pA- in the operator sum decomposition is Hermitian and positive, but generally 
does not have trace one. Since tr(A, pAj) > 0, the operator A,pA ' t is Hermitian, positive, and 

lY(A i pA i ) 

has trace one, and therefore is a density operator. Furthermore, since the trace of a Hermitian, 
positive operator is zero only if the operator is zero, and 1 = tr (Sfj(p)) = Y2f=\ tr(A,-pA;), 

Sfj (p) is a probabilistic mixture of the operators A,pA i — • 


tUAipA}) 


Sfj(P) =J2 Pi 


A,pAj 
tr (AipAj) 


( 10 . 8 ) 


where p, — tr (A/pAj ) and we have ignored any zero terms. 

Operator sum decompositions for superoperators S on subsystem A of system X — A <g> B, and 
their dependence on the basis chosen for B can be understood in terms of measurement. It is not 
a coincidence that equation 10.8 is reminiscent of the equation 


\ p j P P j 

/ Pi -r 

j tr(PjpP]) 


that encapsulates the possible outcomes of measurement of p by operator O with associated 
projectors Pj. Let A, be the operator obtained in the operator sum decomposition for Sf, when 
using basis {|fc/)} for B. Suppose that after L:A®fi->A®Z? was applied to p, subsystem 
B were measured with respect to the projectors P, = \bj) (bj\ for the K — 2 k basis elements | b\) 
for B. The best description of subsystem A after this measurement is a probabilistic mixture of 
mixed states p’ — JL p,pi where 


= ( (.i® p i)u(p®\<i>m)uHi®p?) \ 

B ytr ^(7 <8> Pi)U(p ® \<p)((f>\)U f (I ® P?)j J 

and 

Pi = tr ((/ ® P,)U(p ® | <t>){<P\)U\l ® pfy . 


Since 


tr B ((/ ® \Pi){Pi\)Up ® \<j>){<l>\UHl ® IAXAD) = iPt\Up®m4>\U'\Pi), 

the density operator p' — p,p, is identical to the density operator Sfj(p). 
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10.4.3 A Relation Between Quantum State Transformations and Measurements 

Section 10.3.1 showed that the density operator representing the probabilistic mixture of outcomes 
of a measurement O with associated projectors {P ; ) of a system A initially represented by the 
mixed state p is 


P' = E PJ 

j 


P JP P J 
tr( PjpPj) 


For any measurement O, the map 



can also be obtained in a different way, as the superoperator coming from a unitary transformation 
on a larger system. More specifically, for any observable O of system A, there is a larger system 
Z = A ® fi, a unitary operator U : X —> Z, and a state |i/r) of B such that Sy = So- 

To prove this statement, suppose O has M distinct eigenvalues. Let B be a system of dimension 
M with basis {|yS,->}, and suppose that B is initially in the state \<p) = |/3o). Let U be any unitary 
operator on X = A <g> B that maps 

M 

i=l 

Then for p — |i/r}(i/r|, 

Sfj(p) = tr B (t/ (p <g) |0>(0|)L/ + ) 

M 

= J2 AipA i 

i=i 

M 

= !>,- \f)W\A] 

i =1 

where A/ = (ft|t/|0). Since |0) = |/3 0 ), 

A,|tfr) = X>; \iWmPo) I aj) 


= Eki<a i E Pk\f)\h) 


= i a j) 

j 


= PiW)- 
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So 

M 

i =1 

_y p PiWWPj 
£t P, tr{PiW)W\P?) 

where p, — tr(.P I j VO W-P/)- There is debate within the quantum physics community as to the 
extent to which this relationship between unitary operators and measurement clarifies various 
issues in the foundations of quantum mechanics. We do not elaborate on these issues here. 

10.4.4 Decoherence 

In practice, it is impossible to isolate a quantum computer completely from its environment. 
Because all physical qubits interact with their environment, the computational qubits of a quantum 
computer are properly viewed as a subsystem of a larger system consisting of the computation 
qubits and their environment. By an environment we mean a subsystem over which we have no 
control: we cannot gain information from it by measurement or apply gates to it. 

In some cases, the effect of an environmental interaction on the computational subsystem is 
reversible by transformations on the subsystem alone. But in other cases, decoherence occurs. In 
decoherence, information about the state of the computational subsystem is lost to the environ¬ 
ment. Such errors are serious because the environment is beyond our computational control. The 
next two chapters develop quantum error correction and fault-tolerant techniques to counteract 
errors due to decoherence as well as other sorts of errors, such as those stemming from imperfec¬ 
tions in the implementations of quantum gates. This section lays a foundation for that discussion 
by setting up an error model for errors due to interaction with the environment. 

The operator sum decomposition provides a means for describing the effect on the com¬ 
putational subsystem of an interaction with another subsystem in terms of operations on the 
computational subsystem alone. Using the operator sum decomposition, the effect on the com¬ 
putational subsystem of any interaction with the environment can be viewed as a mixture of K 

A ■ pAl 

errors resulting in the K mixed states —-—', . 

tr (AtpAj) 

Common error models suppose that the environment interacts separately with different parts 
of the computational subsystem. For example, a common error model consists of errors that are 
both local and Markov: 

• local each qubit interacts only with its own environment, and 

• Markov the state of a qubit’s environment, and its interaction with the qubit, is independent 
of the state of the environment in previous time intervals. 

More precisely, under a local error model, the errors to which an n -qubit system is subjected can 
be modeled by interaction with an environment E — £j ® ® E n such that the environment E, 
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interacts with only the i th qubit of X; the errors can be modeled by unitary transforms of the form 
U = U\ <S) ■ ■ ■ <S) U n such that U, acts on /:, and the /th qubit of X is given by superoperators of 
the form Su = Su { ® ® Sy n , where Su i acts on only the /th qubit of X. 

A reasonable way to think of the Markov condition is that each qubit’s environment is renewed 
(or replaced) at each computational time step. More concretely, under a local and Markov error 
model, the computational subsystem X at a given time t interacts with an environment E' — 
E[ ® • • • ® E' n in such a way that the only interactions are between E\ and the / th qubit of system 
X , and the current state of the environment E, and its interaction with X, is independent of the 
state of the environment at any previous time ,v. Most of the quantum error correcting codes and 
fault-tolerant techniques discussed in chapters 11 and 12 are designed to handle local and Markov 
errors. Techniques to handle other error models have been developed, some of which are briefly 
described in section 13.3. 

10.5 References 

Jozsa and Linden [167] show that any quantum algorithm that achieves exponential speedup over 
classical algorithms must entangle an increasing number of qubits. Their proof applies only to 
algorithms run in isolation in which the state is always in a pure state. The results of section 
10.4 show that any mixed state algorithm can be viewed as a pure state algorithm on a larger 
system. The result of Jozsa and Linden still applies in this more general setting, except that the 
entanglement could involve noncomputational qubits of the larger system; it is not required to be 
between the computational qubits. 

Efficient classical simulations of certain quantum systems have be found by Vidal and others 
[278,204], Meyer discusses the lack of entanglement throughout the Bernstein-Vazirani algorithm 
and related results [213]. 

Bennett and Shor’s “Quantum information theory" [38] discusses various entanglement mea¬ 
sures for mixed states of bipartite systems, including some examples and a distillation protocol. 
It is generally a good overview of topics in quantum information theory, including a number of 
interesting topics we will not cover in this book. Brass’s “Characterizing entanglement" [69] is 
an excellent fifteen-page overview of many of the most significant results about entanglement to 
date. Myhr’s master’s thesis, “Measures of entanglement in quantum mechanics" [215], gives a 
readable and more detailed and account of many of these results. 

Nielsen’s majorization results is found in Nielsen [217]. The SLOCC classification of 3-qubit 
states was first described by Diir, Vidal, and Cirac in [107], Briegel and Raussendorf define per¬ 
sistency of entanglement and maximal connectedness in [65], as well as introducing cluster states. 

10.6 Exercises 

Exercise 10.1 . Show that the definition of the partial trace is basis independent. 
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Exercise 10.2. Show that tr b(Oa ® Ob) — Oa tr (Ob). 

Exercise 10.3. 

a. Find the density operators for the whole system and both qubits of |'f ,_ ) = 4^(|00) — 111)). 

b. Find the density operators for the whole system and both qubits of |<t> + } = -^(|01> + |10>). 
Exercise 10.4. Distinguishing pure and mixed states. 

a. Show that a density operator p represents a pure state if and only if p 1 = p. In other words, p 
is a projector. 

b. What can be said about the rank of the density operator of a pure state? 

Exercise 10.5. We showed that any density operator can be viewed as a probability distribution 
over a set of orthogonal states. Show by example that some density operators have multiple 
associated probability distributions, so that in general the probability distribution associated to a 
density operator is not unique. 

Exercise 10.6. Geometry of Bloch regions. 

a. Show that the Bloch region, the set S of mixed states of an ;;-qubit system, can be parametrized 
by 2 2 ” — 1 real parameters. 

b. Show that S is a convex set. 

c. Show that the set of pure states of an n qubit system can be parametrized by 2" +1 — 2 real param¬ 
eters, and therefore the set of density matrices corresponding to pure states can be parametrized 
in this way also. 

d. Explain why for n > 2 the boundary of the set of mixed states must consist of more than just 
pure states. 

e. Show that the extremal points, those that are not convex linear combinations of other points, 
are exactly the pure states. 

f. Characterize the non-extremal states that are on the boundary of the Bloch region. 

Exercise 10.7. Give a geometric interpretation for R(9) and T if) of Section 5.4.1 by determining 
their behavior on the set of mixed states viewed as points of the Bloch sphere. 

Exercise 10.8. The Schmidt decomposition. Every m x n matrix M, with m < n, has a singular 
value decomposition M = UDV where D is an m x n diagonal matrix with non-negative real 
entries, and U and V are m x m and n x n unitary matrices. 

Let | xf) e A <g> B, where A has dimension m and B has dimension n, with m < n. Let {| ?)} be a 
basis for A and {| j)} be a basis for B. then for some choice of m e C 

m— 1 n —1 

i=0 j= 0 
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Let M be the m x n matrix with entries a, j . Use the singular value decomposition (SVD) for M 
to find sets of orthonormal unit vectors {la,}} e A and {| ffi)} e B such that 

m— 1 

i=0 

where A; is non-negative. The A; are called the Schmidt coefficients , and K, the number of A,-, is 
called the Schmidt rank or Schmidt number of | \jr). 

Exercise 10.9. Singular value decomposition. Let A be an n x m matrix. 

a. Let | Uj) be unit length eigenvectors of A ‘ A with eigenvalues A ; . Explain how we know that 
Xj is real and non-negative for all j. 

b. Let U be the matrix with \uj) as its columns. Show that U is unitary. 

c. For all eigenvectors with non-zero eigenvalues define 1 1 >;} = . Let V be the matrix with 

|d,-) as columns. Show that V is unitary. 

d. Show that V '' AU is diagonal. 

e. Conclude that A — VDU ' for some diagonal D. What is I) ? 

Exercise 10.10. For \ijf) e A® B, show that i/ 7 > is unentangled if and only if .S'(tr/jP) = 0, 
where p — I- 

Exercise 10.11. 

a. Show that the states ^(|01) + |10)) and i(|00} — i\ 11)) are maximally entangled. 

b. Write down two other maximally entangled states. 

Exercise 10.12. What is the maximum possible amount of entanglement, as measured by the 
von Neumann entropy, over all pure states of a bipartite quantum system A ® B where A has 
dimension n and B has dimension m with n > m ? 

Exercise 10.13. Claim: LOCC cannot convert an unentangled state to an entangled one. 

a. State the claim in more precise language. 

b. Prove the claim. 

Exercise 10.14. Show that the four Bell states IT 1 }* and |are all LOCC equivalent. 

Exercise 10.15. 

a. Show that any two-qubit state can be converted to 100) via LOCC. 

b. Show that any n -qubit state can be converted to a state unentangled with respect to the tensor 
decomposition into the n qubits. 

Exercise 10.16. Show that the vector of ordered eigenvalues X^ for the density operator of any 
unentangled state | i/r> of a bipartite system majorizes the vectors for any other state of the bipartite 
system. 
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Exercise 10.17. Maximally entangled bipartite states. Let |i/r> be a state of the form 


\f) 


X! ^t) ® I <t>?) 

i =1 


where the {|t/L 1 }} and {\<pf }} are orthonormal sets. Show that the vector X^ is majorized by X^ for 
all states \cp) e A® B. 

Exercise 10.18. Classify all two-qubit states up to SLOCC equivalence. 

Exercise 10.19. Show that \GHZf) can be converted via SLOCC to any A-BC decomposable 
state. 


Exercise 10.20. Show that the states \GHZ n ) are maximally connected. 

Exercise 10.21. Show that the states \ W n ) are not maximally connected. 

Exercise 10.22. 

a. If |-0-} has persistency n and \<p) has persistency m, what is the persistency of |i j/) ® | </>}? 

b. Show by induction that the persistency of \W„) is n — 1. (Hint: You may want to use (a).) 


Exercise 10.23. 

a. Check that each of the cluster states of examples 10.2.7, 10.2.8, and 10.2.9 is stabilized by the 
operators of equation 10.7. 

b. Find the cluster state for the 1x5 lattice. 

c. Find the cluster state for the 2x2 lattice. 


Exercise 10.24. Maximal connectedness of cluster states. 

a. Show by induction that for the qubits corresponding to the ends of the chain in the cluster state 
\<p n ) for the 1 x n lattice, there is a sequence of single-qubit measurements that place these qubits 
in a Bell state. 

b. Show that for any two qubits cp and cp_ in a graph state, there exists a sequence of single-qubit 
measurements that leave these qubits as the end qubits of a cluster state of a 1 xr lattice. Conclude 
that graph states are maximally connected. 

Exercise 10.25. Persistency of cluster states. For the cluster state \<Pn) corresponding to the 1 x N 
lattice for N even, give a sequence of N /2 single-qubit measurements that result in a completely 
unentangled state. 

Exercise 10.26. Show that if {|x,}} is the set of possible states resulting from a measurement and 
Pi is the probability of each outcome, then p = p, |x,)(x,-1 is Hermitian, trace 1, and positive. 

Exercise 10.27. For initial mixed state Pa® Pb, find the mixed state of A after the transformation 
u = |00)(00| + 110) (011 + |01}(10| + 111) (111 has been applied. 
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Exercise 10.28. Suppose that subsystem A — A\ <g> A 2 and that {/ : A <g> ,B — > A <g> B behaves as 
the identity on Ai. In other words, suppose U = I <B)V where I acts oil A t and V acts on A 2 ® B. 
Show that for any state \<p) of system B. the superoperator Sf, can be written as / <S) S for some 
superoperator S on subsystem A 2 alone. 

Exercise 10.29. 

a. Give an alternative operator sum decomposition for example 10.4.3. 

b. Give an alternative operator sum decomposition for example 10.4.4. 

c. Give a general condition for two sets of operators {A,} and { A } to give operator sum 
decompositions for the same superoperator. 

Exercise 10.30. 

a. Describe a strategy for determining which sequence was sent in the game of section 10.3 if 
both qubits are received. More specifically, you receive a sequence of pairs of qubits. Either all 
pairs are randomly chosen from {100), 111)} or all pairs are in the state (100) + 111)). Describe 
a strategy for determining which sequence was sent. 

b. For each sequence, write down the density operator representing that sequence. 
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For practical quantum computers to be built, techniques for handling environmental interac¬ 
tions that muddle the quantum computations are required. Shor’s algorithms, while universally 
acclaimed, were initially thought by many to be of only theoretical interest; estimates suggested 
that unavoidable interactions with the environment were many orders of magnitude too strong to 
be able to run Shor’s factoring algorithm on a number that was of practical interest, and no one 
had any idea as to how to perform error correction for quantum computation. Given the impossi¬ 
bility of copying an unknown quantum state, a straightforward application of classical methods 
to the quantum case is not possible, and it was far from obvious what else to do. Results such as 
the no-cloning theorem made many experts believe that robust quantum computation might be 
impossible. It turns out, however, that an elegant and surprising use of classical techniques forms 
the foundation of sophisticated quantum error correction techniques. Quantum error correction is 
now one of the most extensively developed areas of quantum computation. It was the discovery 
of quantum error correction, as much as of Shor’s algorithms, that turned quantum information 
processing into a significant field in its own right. 

In the classical world, error correcting codes are primarily used in data transmission. Quan¬ 
tum systems, however, are difficult to isolate sufficiently from environmental interactions while 
retaining the ability to perform computations. In any quantum system used to perform quan¬ 
tum information processing, the effects of interaction with the environment are likely to be so 
pervasive that quantum error correction will be used at all times. 

We begin in section 11.1 with a few simple examples to give a sense for the workings of 
quantum error correction, particularly purely quantum aspects such as how quantum superposi¬ 
tions of both errors and states are handled. A general framework for quantum error correction 
is given in section 11.2. This framework has similarities to the framework for classical codes 
but is considerably more complicated. Quantum error correcting codes must handle the infi¬ 
nite variety of single-qubit states and the peculiarly quantum ways in which qubits can interact 
with each other. In section 11.3, Calderbank-Shor-Steane (CSS) codes are presented. Then, in 
section 11.4, the more general class of stabilizer codes is described. Most of the specific quantum 
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error correcting codes we consider are designed to correct all errors on k or fewer qubits. Such 
codes work well for systems subject to independent single-qubit, or few-qubit, errors. While 
that sort of error behavior is expected in many situations, other reasonable error models exist. 
Throughout the chapter we pretend that quantum error correction can be carried out perfectly. 
Chapter 12 discusses fault-tolerant methods that enable quantum error correction to work even 
when carried out imperfectly. Other approaches to robust quantum computation are discussed in 
chapter 13. 

11.1 Three Simple Examples of Quantum Error Correcting Codes 

Classical error correcting codes map message words into a code space, consisting of longer words, 
in a redundant way that allows detection and correction of errors. Quantum error correcting codes 
embed the vector space of message states, called words, into a subspace of a larger vector space, 
the code space. A quantum algorithm that logically operates on n-qubits is implemented as an 
algorithm operating on the much larger m -qubit system in which the n -qubits are encoded. To 
detect and correct an error, computation into ancilla qubits is performed and the ancilla are mea¬ 
sured. Error correcting transformations are applied according to the result of that measurement. 
To preserve superpositions, the encoding and measurements must be carefully designed so that 
these measurements give information only about what error occurred and not about the encoded 
state of the computation. 

To give a general sense for quantum error correction, particularly its use of measurement and 
its ability to correct superpositions of correctable errors, we first describe a simple code that 
corrects only single-qubit bit-flip errors, then a code that corrects only single-qubit phase errors, 
and finally a code that corrects all single-qubit errors. 

11.1.1 A Quantum Code That Corrects Single Bit-Flip Errors 

A single-qubit bit-flip error applies X to one of the qubits of the quantum computer. The fol¬ 
lowing simple code is a quantum version of the classical [3, 1] repetition code, which will be 
described more formally in section 11.2. It detects and corrects any of the three single bit-flip 
errors 

{X 2 = X <g> I ® I, Xi = I ® X ® /, X 0 = I <g> I <g> X}, 

where Xj means the tensor product of X applied to the i th qubit with the identity on all other 
qubits. 

In brief, the [3,1] repetition code encodes each bit in three bits as 
0 -> 000 


1 


111 . 
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Decoding is done by majority rules 

000 
001 
010 
100 

011 
101 
110 
111 

To implement majority rules, first determine if an error has occurred by comparing the first bit 
with each of the other bits. More formally, to make the comparisons, use two additional bits, 
called ancilla , to hold the computation of bo © b \ and £>2 © bo respectively. This computation is 
called the syndrome computation. The syndrome values of bo © b\ and £>2 © bo determine which 
error correcting transformation should be applied, as shown in table 11.1. The first line of the 
table says that if £>2 © b\ and bo © bo are both zero, do nothing. The second lines says that if 
bo — b\ but bo bo, flip bo so that it agrees with b 2 and b\, the majority. Similarly, if bo ^ b\ 
and /?2 = bo, flip b\. Finally, if bo ^ b\ and bo ^ bo, then b\ and bo must agree, so flip bo to 
make it agree with the majority. No matter what happened previously, this procedure results in a 
codeword. However, if more than one error has occurred it will correct to the wrong word. For 
example, if the original string was 000 and two bit-flip errors occur, one on the first qubit and one 
on the third, the resulting string, 101, will be “corrected" to 111 under this procedure. The 
[3, 1] repetition code can correct only single bit-flip errors. More powerful codes, such as 
the [n, 1] repetition codes that encode one bit in n bits and decode by majority rules, can correct 
more errors. 

Both classical and quantum error correction spread the information we want to protect across 
several qubits so that individual errors have less of an effect. The [3,1] repetition code encodes 0 
and 1 as the bit strings 000 and 111 respectively. In the quantum setting, let Cbf be the subspace 
spanned by {1000), 1111)}. This quantum code encodes the state 10) in the state 1000) and 11) in the 
state 1111). Linearity of the code and these relations define a general encoding cbf of single-qubit 


Table 11.1 

Syndrome and corresponding error correcting transformations for the classical [3, 1] repetition code. 


bo^bi 

bo © bo 

Error correcting transformation 

0 

0 

identity 

0 

1 

flip bo 

1 

0 

flip b\ 

1 

1 

flip 
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states into the subspace Cbf of the state space for a three-qubit system: 

cbf : | 0 > 0 | 00 ) -> | 000 ) 

|1) 0 |00) |111), 

so a|0) + b\ 1) maps to a|000) + b\ 111). In general, for a quantum code, we use the notation |0) 
for the encoding of |0) and likewise for other states. For this code, |0) = |000) and 11) = | 111). 

The set of states a|0) +/r|l) = a|000) + fo|lll) is a two-dimensional vector space, so it may 
be considered a qubit in its own right. It is called the logical qubit to distinguish it from the 
three computation qubits whose tensor product forms the entire eight-dimensional code space. 
States such as 1101) that are not logical qubit values are not legitimate computational states. 
Legitimate states, the possible values of the logical qubits, are called codewords. On a logical 
qubit a|000) + b\ 111), single bit-flip errors no longer take legitimate computational states to 
legitimate computational states, but to states that are not codewords. For example, a bit-flip error 
on the first qubit results in the state a\ 100) + Z?|011), which is not a codeword because it is not 
in Cbf- The goal of an error correction scheme is to detect non-codeword states and transform 
them back to codewords. 

To detect an error, we compute the XOR of the first and second qubits into one ancilla qubit, 
and the XOR of the first and third qubits into another. More formally, 

Ubf ■ 1*2, * 1 , * 0 , o, 0) —► 1*2, * 1 , * 0 , *2 0 * 1 , *2 ©*o)‘ 


The transformation Ubf is called the syndrome extraction operator and has quantum circuit 


*2) 

*l) 

*0) 

| 0 ) 

l°> 


-e 

3-e 

3- 










-> 

(-> 

f— 


a\) 


«o)- 


The ancilla qubits are then measured in the standard basis, and the error syndrome is obtained. 

The use of the syndrome parallels that of the classical [3, 1] repetition code. In addition to 
correcting all single bit-flip errors, the code must not corrupt correct states, so for convenience we 
also consider 7 0/0/ as an “error" we can correct. The information we gain from measuring 
the ancilla enables us to choose the right transformation to apply to correct the error. Since 
X — X 1 , the correcting transformation in this case is the same as the error transformation that 
occurred. The following table gives the transformation to apply given the measurement of the 
ancilla: 
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Bit flipped 

Syndrome 

Error correction 

none 

|00> 

none 

0 

|H> 

X 2 = I®I®X 

1 

110} 

X x = I®X®I 

2 

|01> 

Xo = X®I®I. 


Because of its close parallel with the classical [3,1] code, it is not surprising that this procedure 
corrects any single bit-flip errors on the encoded standard basis states 10} = |000} and 11} = | 111). 
In addition, it corrects single bit-flip errors on superpositions of codewords. 


Example 11.1.1 Correcting a bit-flip error on a superposition. A general superposition <//} = 
a|0) + b\ 1} is encoded as 

\xjr) = fl |0} + 6|I} = a|000) +&|111>. 

Suppose |t/r) is subject to the single bit-flip error X 2 = X ® I <g> I, resulting in 
Z 2 |iA> =fl|100)+Z?|0U). 

Applying the syndrome extraction operator Ubf to X 2 \tjr) ® |00) results in the state 
U B F{(X 2 \f))®m) = fl|100>|ll)+i|011}|ll> 

= (a|100) + &|011))|ll) 

Measuring the two ancilla qubits yields 111), and the state is now 
(fl|100} + 6|011))<g> |11>. 

The error can be removed by applying the inverse error operator X 2 , corresponding to the measured 
syndrome 111}, to the first three qubits. Doing so reconstructs the original encoded state 

\ir) = a\0) + b\l) =a|000)+&|lll». 


The intuition behind why this procedure does not irreparably disturb the quantum state, even 
though it includes measurement, is that measurement of the ancilla by the syndrome extraction 
operator tells us nothing about individual computational qubit states, only about what errors 
occurred. If the syndrome extraction operator is applied to a codeword a|0) + b\ 1), the result of 
the measurement of the ancilla will be the syndrome 00 regardless of whether the codeword is 
|0), 11}, or some superposition of the two. Similarly, if error X 2 = X ® I ® I has occurred, the 
syndrome will be in state 111) regardless of whether the computational qubits are in state 1100), 
|011}, or a superposition of the two. Thus measuring the ancilla qubits gives no information 
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about the states of the computational qubits, but it does give information about what error has 
occurred. Measuring the ancilla qubits gives information about the error without disturbing the 
computation, even when the initial state is a superposition a |000) + b\ 111). 

Unlike in the classical case, linear combinations of quantum errors are also possible. This same 
procedure also corrects linear combinations of bit-flip errors. 


Example 11.1.2 Correcting a linear combination of bit-flip errors. Suppose the state |0) has been 
encoded as |0) = |000) and an error E — aX 0/0/ + /3/0X®/, a linear combination of the 
two single bit-flip errors X 2 and X \, occurs, yielding 

£|0> = a|100)-M|010). 

Applying the syndrome extraction operator Ubf to (£|0)) ® |00) results in the state 
t/ B f((£|0»®|00» = a|100)|ll) + j8|010)|10). 

Measuring the two auxiliary qubits of this state yields either 111) or 110). If the measurement 
produces the former, the state is now 1100). The measurement has the almost magical effect 
of causing all but one summand of the error to disappear. The remaining part of the error can 
be removed by applying the inverse error operator X 2 — X ® I <g> /, corresponding to the mea¬ 
sured syndrome 111). Doing so reconstructs the original encoded state |0) = 1000). If instead the 
syndrome measurement yields 110), we would apply X\ to |010) to recover the original state 
| 0 > = 1000 ). 


While linear combinations of single bit-flip errors can be corrected in this way, multiple bit-flip 
errors cannot be corrected by this code. The distinction between linear combinations of single 
bit-flip errors and multiple bit-flip errors is that in the former case any term in the superposition 
representing a computational state contains only one error, but in the second case a single term 
may contain multiple errors that will be misinterpreted by the syndrome. 

In the classical case, the [3, 1] code corrects all possible single bit errors. The quantum code 
Cbf , while based on the [3, 1] code, does not correct all single-qubit errors. In the classical 
case, bit flips are the only possible errors; in the quantum case, there is an infinite continuum 
of possible single-qubit errors. The code Cbf does not even detect, let alone correct, phase 
errors. 


Example 11.1.3 Undetected phase error. Suppose the quantum state |+), encoded as 


1+) 


1 

7 ^ 


(| 000 )+ 1111 )), 
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is subjected to a phase error E — Z <g> / ® /. The state |+) becomes the error state 


E |+> = 


1 

7 ^ 


(|000> — |111». 


The syndrome extraction operator Ubf applied to £|+>|00) results in £j+}|00), so no error is 
detected, let alone corrected. 


It is easy to construct a code that corrects all single-qubit phase-flip errors, but does not correct 
single-qubit bit-flip errors. The next section describes such a code. To obtain a code that corrects 
all single-qubit errors requires more cleverness. It turns out that, by carefully combining codes that 
correct bit-flip and phase-flip errors, a code correcting all single-qubit errors can be constructed. 
Such a code is given in section 11.1.3. 

11.1.2 A Code for Single-Qubit Phase-Flip Errors 

Consider the three single-qubit phase-flip errors Z 2 , Z\, Zq of a three-qubit system, where 
{Z 2 = Z <g> I ® /, Z\ = 1 ® Z® I, Z 0 = I ® I ® Z}. 

Phase-flip errors, Z, , in the standard basis are bit-flip errors X — HZH in the Hadamard basis 
{|+}, |—>}, and vice versa. This observation suggests that appropriate modifications to the bit-flip 
code Cbf °f section 11.1.1 will result in a code Cpp that corrects phase-flip errors instead. To 
obtain the logical qubits for the code Cpp, apply the Walsh-Hadamard transformation 1T (3) = 
H ® H ® H to the logical qubits of the code Cbf ; the logical qubits for Cpp are |0) = | + + +> 
and |1) = |->. 

The phase-flip error Z 2 sends |+ + +> to |—1-+> and |-> to |H-}. To detect such 

errors, the syndrome extraction operator Upp for Cpp can be obtained from Ubf by changing 
basis from the standard basis to the Hadamard basis. Since, in the Hadamard basis, phase flips 
appear as bit flips, applying Ubf from code Cbf detects the error. Once the syndrome has been 
obtained by measuring the ancilla qubits in the standard basis, the error can be corrected by 
applying the bit-flip operator corresponding to the syndrome for code Cbf and then applying W 
to change back to the original basis. Instead, because HX = ZH , the error may be corrected 
by first applying W, and then the appropriate error correction transformation from the following 
table: 


Bit shifted 

Syndrome 

Error correction 

none 

100 } 

none 

0 

111 ) 

Z 2 = Z ® I ® I 

1 

| 10 > 

Z\ — 1 ® Z <g> I 

2 

| 01 > 

z 0 = / <g> / <g> z. 
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Thus Upf = WUbfW , with implementation 



The code Cpf corrects all single-qubit relative phase errors, not just Z, because any single-qubit 
phase error is a linear combination of Z and I up to an irrelevant global phase factor: 

qb . (p 

cos — I — ism—Z 
2 2 


/1 

0 N 

\ i- ( 

( 0 

e i4> 

H 2 


The code Cpf does not correct bit-flip errors, let alone general single-qubit errors. 


11.1.3 A Code for All Single-Qubit Errors 

Section 11.2.11 shows that a quantum error correcting code C that can correct all X , and all Z, 
errors can also correct all Y, errors. Section 11.2.9 shows that any superposition (linear combina¬ 
tion) of correctable errors is correctable. Section 11.2.9 also shows that the Pauli errors /, X , Y, 
and Z form a basis for all single-qubit errors. So if we can design a code that corrects all Xj and 
Z, errors, the code will actually correct all single-qubit errors. 

To construct such a code, it is natural to try to combine Cbf and Cpf- First encoding a 
qubit using Cpp and then encoding each resulting qubit using Cbf leads to the nine-qubit 
code 

10} 10} = -U|000} + |111»®(|000} + |111»®(|000} + |111», 

V8 

11} -> |1} = -U|000>-I111})®(1000}-I111})®(1000}-I111}), 

V8 

known as Shor’s nine-qubit code. For convenience, we often write these states as 

|0> -> 10} = 4=(|000> + l 111 »® 3 

V8 

| 1 } -> | 1 } = -^(| 000 >-| 111 }® 3 . 

V8 
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To perform error correction, first use Ubf on each block of three qubits to correct for possible 
X errors in each block separately. At this point the bit values of the state are correct — in any 
term of the superposition, the three qubits of each block now have the same bit value — but the 
relative phases may be wrong. To correct phase errors, a variant of Upf is used, essentially an 
expansion of Upf to nine qubits instead of three. More details are given in section 11.3. 

The term code, in both the classical and quantum setting, refers to the set of codewords. 
The mapping of the original strings or states into the codewords is not of great importance; a 
different mapping allows exactly the same set of errors to be corrected. Moreover, the encoding 
map is not generally implemented. The mapping u|0) +b\\) to a|0> + Z?|1> should be viewed 
as an abstract mapping; we do not start with qubits of the form «|0) + /?11} and then encode 
them. Rather, we define the logical qubits of a system in this way, and we design gates and 
interpret measurements in terms of these logical qubits. For example, for Shor’s code, instead 
of computing directly on n single qubits, each qubit is encoded in 9 qubits, totaling 9 n qubits 
altogether. All quantum computation takes place on the n logical qubits, each consisting of nine 
qubits. It is on the 2"-dimensional subspace containing the logical qubits, not on the full 2 9 "- 
dimensional space, that we compute. Error correction returns states to this subspace, and it is on 
this 2"-dimensional subspace, not on the full 2 9 "-dimensional space, that we need a universal set 
of gates. Sections 11.2.8 and 11.4.4, and then much of chapter 12, concern the design of such 
gates. 

Later sections describe codes that correct multiple-qubit errors and codes that correct all single¬ 
qubit errors using fewer than nine qubits. Before discussing those codes, we need to develop more 
systematic ways of thinking about and describing codes. 

11.2 Framework for Quantum Error Correcting Codes 

As section 10.4.4 explained, errors on the computational system due to interactions with the 
environment are linear, but not necessarily unitary. Because unitary transformations are invert¬ 
ible, if we can figure out what unitary error has occurred, we can correct it. But general errors 
may not have inverse transformations, so if such an error occurs, even if we have been able 
to determine which error has occurred, it is not obvious how to correct it. At first glance we 
might guess that such errors cannot be corrected without access and control over the part of the 
environment that interacted with the system. It is true that these errors cannot be corrected by 
applying unitary quantum transformations to the computational system alone. By measuring the 
system, however, or by entangling the system with auxiliary qubits, nonunitary errors can be 
corrected. 

When a system has been subjected to decoherence under which it undergoes a nonunitary 
transformation, information about the original state of the system has been lost. For example, 
decoherence could swap a qubit in the environment with a qubit of the computational system, 
resulting in a complete loss of information about that qubit, except what can be deduced from 
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other qubits. If the qubit’s state was completely uncorrelated with the other qubits’ states, all 
information about the state of that qubit is lost. The idea behind any sort of scheme for protecting 
information stored in quantum states is to embed the quantum states we care about in highly 
correlated states of a larger quantum system. To correct against general quantum errors, this 
correlation must be quantum; these states must be highly entangled states. 

The art of designing quantum error correcting codes is to choose the embedding of k logical 
qubits into an n -qubit system in such a way that measurements can correct the most common 
errors to which the system is likely to be subjected. Generally, this embedding is taken to be 
linear: it is given by a linear map between the 2 k -dimensional vector space of the logical sys¬ 
tem and the 2"-dimensional vector space of the larger system. We consider only linear codes 
here. Quantum codes have been designed for many types of errors. The most frequently con¬ 
sidered family of errors consists of all errors on t or fewer qubits. We concentrate on this 
family of errors after presenting a general framework for quantum error correction. As phys¬ 
ical implementations of quantum computers are developed it will be possible to determine to 
which sorts of errors a given physical device is most subject and to design error correcting 
codes or other forms of error protection to guard most efficiently and effectively against those 
errors. 

Linear quantum codes are closely related to classical block codes. For each concept in quantum 
error correction, we first review related concepts from classical codes. For this reason, this 
section alternates between short subsections describing classical error correction and subsections 
describing quantum error correction. 

This exposition is most suitable for readers who have some familiarity with classical error 
correcting codes; readers new to error correcting codes may wish to read all of the classical 
sections first to get a feel for the general strategies employed in error correction. Both classical 
and quantum error correction rely heavily on group theory. Boxes containing brief reviews of 
groups, subgroups, and Abelian groups can be found in section 8.6.1 and section 8.6.2. A few 
more boxes are interspersed throughout this chapter. Readers new to group theory will need to 
study the relevant sections of a text devoted to group theory. Suggested texts are given in the 
reference section at the end of this chapter. 

This section describes a general nonconstructive framework for linear quantum error correcting 
codes, specifying properties that all linear quantum error correcting codes must satisfy. This 
framework pays no attention to whether or how a code can be efficiently implemented. This issue 
is crucial to whether the code is useful or not and will be dealt with more carefully later in this 
chapter and in chapter 12. 

11.2.1 Classical Error Correcting Codes 

A classical [n, k] block code C is a size 2 k subset of the 2" possible zz-bit strings. The set of 
zz-bit strings is a group, written Z", under bitwise addition modulo 2. If the 2 k size subset C is 
a subgroup of Z", then the code is said to be an [zz, k] linear block code. When a code is used. 
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Box 11.1 

Group Homomorphisms 


A homomorphism f from a group G to a group H is a map / : G —> H that satisfies, for any elements 
gl and g 2 of G, 

/(gl °g2> = /(gl)°/(g2)- 

The product used on the right-hand side is the product for group G, while on the left-hand side it is 
the product for group H. An isomorphism from group G to group H is a homomorphism that is both 
one-to-one and onto. If there is an isomorphism between H and G, H and G are isomorphic, written 
H = G. 

The kernel of a homomorphism f : G -*■ H is the set of elements of G that are mapped to the 
identity element ep of H. 


a specific encoding function c : ZJ\_ —> Z" is chosen, where c is an isomorphism between Zf, the 
message space, the set of all A-bit strings, and C, the code space: c : Zf -> C C Z". In general, 
for any code C, there are many possible encoding functions. It may seem odd that the code is 
defined purely in terms of the subgroup C, not in terms of an encoding function. The reason for 
this convention is that no matter which encoding function is chosen, exactly the same set of errors 
can be corrected. 

To encode a length ink message, each of the m blocks of length k are separately encoded using 
c to obtain a ciphertext of length mn. For this reason these codes are called block codes. The 
encoding function c can be represented by an n x k generator matrix G that takes a message word, 
an element of ZJ\_ viewed as a length k column vector, to a codeword, an element of C C Z": the 
generator matrix G multiplied with a message word gives the corresponding codeword. The k 
columns of G form a linearly independent set of binary words. 


Example 11.2.1 The [3, 1] repetition code. The [3, 1] repetition code is defined to be the sub¬ 
set C = {000, 111} of all 3-bit strings. This subset is a subgroup of Z, under bitwise addition 
modulo 2. 

The standard encoding function sends 
0 000 

1 111 

and the associated generator matrix is 





256 


11 Quantum Error Correction 


which acts on bit strings viewed as column vectors: 


( 



1 

1 

1 


( 0 ) 


( 



1 

1 

1 


( 1 ). 


A more interesting code is the [7, 4] Hamming code. A widely used quantum code, the Steane 
code, is built using special properties of the [7, 4] Hamming code. The Steane code will be 
introduced in section 11.3.3 and is a member of some major code families, including CSS codes 
and stabilizer codes, which are the subjects of sections 11.3 and 11.4 respectively. 


Example 11.2.2 The [7, 4] Hamming code. The [7, 4] Hamming code C encodes 4-bit strings, 
elements of l\, in 7-bit strings, elements of Zj. The code C is the subgroup of Zj generated 
by {1110100, 1101010, 1011001, 1111111}. The reasoning behind this construction will become 
clear in section 11.2.5. One encoding function for C sends 

1000 h* 1110100 

0100 i-> 1101010 

0010 h* 1011001 
0001 h* 1111111 

These relations, together with linearity, fully define the encoding. The generator matrix G' for 
this encoding is 

/1 1 1 0 1 0 o \ T 

, _ 110 10 10 

10 110 0 1 

V i i i i i i i / 

An alternative encoding function sends 

1000 h* 1000111 

0100 h* 0100110 

0010 i-> 0010101 


0001 h* 0001011 
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with generator matrix G 

( 1 0 0 0 \ 

0 10 0 

0 0 10 

G = | 0 0 0 1 

1110 
110 1 
V 1 0 1 1 / 


11.2.2 Quantum Error Correcting Codes 

A [[n, A:]] quantum block code C is a 2 k -dimensional subspace C of the vector space V associated 
with the state space of an n -qubit system. The double square brackets are used to distinguish 
[[ n , A:]] quantum codes from [n, k] classical codes. View W, the A:-qubit message space, as the 
subspace of V that has as basis the subset of the standard basis consisting of all strings in which the 
first n — k elements are 0. Any unitary transformation Uc : V -> V that takes W to C is a possible 
encoding operator for code C. In most cases we do not care how Uc behaves on states outside IT, 
so frequently when we define an encoding operator Uc we will specify only its behavior on W 
and not on all of V. Elements \u>) e W are called message words, and elements of C are called 
codewords in analogy with the classical case. This terminology should not be taken too literally; 
neither message words in W nor codewords in C are bit strings, but rather quantum states of k 
and n qubits, respectively. 

Just as in the classical case, it is the subspace C, not the encoding function, that defines the 
code; the same set of errors can be corrected no matter which encoding function is used. Given 
an encoding function and any state represented by | w) e W, the image Uc(\u>)) — |w) of |u>) is 
an »-qubit state referred to as the logical A'-qubit state corresponding to |u;}. 


Example 11.2.3 The bit-flip code revisited. ThecodeC is the subspace spanned by {| 000), 1111}}. 
The standard encoding operator is 

U c : |0> h* |000> 

|1) H* 1111). 

So 1 6 ) = |000) and |1) = |111). 


Strictly speaking, we should write 

U c : |000) h* |000) 

|001) |1U) 
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and define Uq on the rest of V, but we will generally define encoding functions in this way, since 
we do not care how the encoding behaves on states outside W , and the function definition is easier 
to read if we leave off the initial prefix string of zeros. 


Example 11.2.4 The Shor code revisited. The Shor code is a [[9, 1]] code, where C is the two- 
dimensional subspace spanned by 

-^(1000) +|1U»® 3 

v 8 

and 


1 

V8 


(| 000 > - | 111 }® 3 . 


The standard encoding operator used with this code sends 


|0> -* |0> 

|1> II) 


1 

V8 

1 

V8 


(|000> + |111))® 3 
(| 000 }-| 111 )® 3 , 


but any other function mapping 10) and 11) to two orthogonal vectors within the subspace C would 
also be a legitimate encoding function. 


In practice it is not necessary to implement the encoding and decoding functions. At the 
beginning of a computation we simply construct the valid starting state, and at the end we interpret 
the classical information obtained from measurement to deduce information about the final logical 
state. Sections 11.2.8 and 11.4.4, and then much of chapter 12, discuss how to compute directly 
on the encoded data. 

11.2.3 Correctable Sets of Errors for Classical Codes 

A classical error may be viewed as an n-bit string e e Z" that acts on code words through bitwise 
addition ®, flipping a subset of the code bits. Any code C corrects some sets of errors and not 
others. A set of errors £ is said to be correctable by code C if, for a w in Z", there is at most one 
error that could result in w: for all e \, e £ and c\,C 2 e C, 

e\ © ci ^ e 2 © C 2 - (11-1) 

This condition is called the disjointness condition for classical error correction. Usually £ is 
taken to be a group under bitwise addition modulo 2, so £ contains the identity element, the 
non-error 00 • • • 0. The disjointness condition for e\ = 00 • • • 0 means that a correctable error 
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cannot take a codeword to a different codeword. For any code C, there are many possible sets 
of correctable errors. Some correctable sets of errors are better than others from a practical point 
of view. 


Example 11.2.5 Correctable error sets for the [3, 1] repetition code. The set £ — 
{000,001,010, 100} is a correctable set of errors for the [3, 1] repetition code C. The set 
£' = {000, 011, 101, 110} is also a correctable set of of errors for C. The union of 8 and £' is not 
a correctable set for C. 


11.2.4 Correctable Sets of Errors for Quantum Codes 

For classical error correction, it suffices to consider bit-flip errors, a simple discrete set of errors. 
For quantum error correction, neither the encoded states nor the possible errors form a discrete set. 
For this reason, specifying correctable sets of errors for a quantum code C is more complicated 
than for a classical code. Fortunately, it is simpler than we might at first fear. 

Let B c — {|ci>,..., |c*}} be a (orthonormal) basis for C. A finite set £ = {Ei, £ 7 , ..., E L } of 
unitary transformations E, : V -> V is said to be a correctable set of errors for code C if there 
exists a matrix M with entries m, ; - such that 

{c a \E]Ej\c b ) =m ij 8 ab (11.2) 

for all | c a ), \c b ) € C and £), Ej e £. Thenext few paragraphs clarify the meaning and motivation 
for this definition. 

Just as in the classical case, there are many possible sets of correctable errors for a code 
C. Furthermore, there is no maximal correctable set, but some sets are more useful than oth¬ 
ers from a practical point of view. To perform error correction, one set of correctable errors 
is chosen, and the error correction procedures are designed with respect to that set. In the 
quantum case, the set of errors corrected by these procedures is much larger than the original 
correctable set £\ section 11.2.9 shows that if there is a procedure for a code C that corrects a 
set of errors £ — {If. Ei, ..., E L }, then any superposition or mixture of errors in £ can also 
be corrected by code C. It is this property that enables the correction of the general errors, 
discussed in section 10.4, that can be modeled as probabilistic mixtures of linear transforma¬ 
tions. Since unitary errors E are easily corrected by applying the inverse transform /A, the 
errors of a correctable set have a clear error correction procedure once the error is known. 
The next two paragraphs give intuitive justification for the Correctable Error Set Condition 
(equation 11.2). 

Just as in the classical case, there is no hope of correctly recovering from a set of errors £ that 
contains a pair of error transformations that take two different codewords to the same state. The 
quantum case has a stronger requirement along these lines: any two distinct errors in £ must take 
orthogonal codewords to orthogonal states. The reason for this requirement is that in order to 
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determine which error is likely to have occurred, we need to make measurements, and two states 
can be distinguished with certainty if and only if the two states are orthogonal. This condition 
guarantees that the images of two different codewords under errors in £ are distinguishable if the 
original codewords are distinguishable. This condition is written 

(c\EjEj\c') = 0 (11.3) 

for all Ej, Ej e £, and all |c), |c'} e C such that (c|c') = 0. This orthogonality condition is the 
analog of the disjointness condition, equation. 11.1, for classical error correction. 

In the quantum case, in order for error correction not to destroy the quantum computation, 
an additional condition is needed. Measurements made to determine the error must not give any 
information about the logical state, since otherwise superpositions may be destroyed, making the 
quantum computation useless. For this reason, we require 

(c a \E;Ej\c a ) = (c b \EjEj\c b ) (11.4) 

for all | c a ), | Cb) e C and E, . Ej e £. This requirement means that for every pair of indices i and 
j, there is a value ntjj such that 

{Ca\Ej Ej I Ca) — rtlij- 

Putting conditions 11.3 and 11.4 together results in the original equation 11.2: 

(c'a | Ei Ej |Q)) — nijjSab 

for all | c a ), \c b ) e C and Ej, Ej e £, where a significant part of the meaning of this formula is 
that nijj is independent of a and b. 

Condition 11.2 holds if 

{c a \EjEj\c b )=0 (11.5) 

for all | c a ), \c b ) e C and Ej, Ej e £ such that/ ^ j, but this condition is stronger than necessary. 
If two different errors £j and £3 take a state |i jr) to the same state | \[r'), no matter which error 
occurred, applying e\ (or equally well e\) corrects the error. Condition 11.5 holds for many 
quantum codes, but not for some important codes. A code that does not satisfy this condition is 
called a degenerate code for error set £. Shor’s code is degenerate, for example: a relative phase 
error acting on the first qubit will have the same effect as a relative phase error acting on the 
second qubit. The existence of degenerate codes complicates matters. There is no classical analog 
for degenerate quantum codes. 

The unitarity of the £,■ means that E,C has dimension 2 k for all errors Ej. Since there can 
be at most 2" k mutually orthogonal subspaces of dimension 2 k in a space of dimension 2", the 
maximum size of a set £ of correctable errors for a nondegenerate code is 2 n ~ k . For degenerate 
codes, the size of a maximal set of correctable errors can be greater than 2 n ~ k . 
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Example 11.2.6 The bit-flip code revisited. The set of errors £ = {£,7} with 

£00 = / ® / ® /, £01 = X ® I ® /, £10 = I ® X ® I, E n = I ® I ® X 

is a correctable error set for the bit-flip code. 

The set of errors £' = { E' jj } with 

E' m = I ® I ® I, E' 0l = I ® X ® X, E[ 0 = X ® I ® X, E' n = X ® X ® I 

is a different correctable error set for the bit-flip code. In this case, the code corrects all two-qubit 
flip errors, but none of the single bit-flip errors. Of course, this set of correctable errors is of little 
practical value, since single bit-flip errors are generally more likely than pairs of bit-flip errors. 
But it is conceivable that in certain physical implementations, bit-flip errors are more likely to 
appear in pairs. 


11.2.5 Correcting Errors Using Classical Codes 

Let C be a classical [n , k] linear block code, and suppose £ is a correctable set of errors for C. 
Suppose ui — e (Be for some codeword c e C and error e e £. We wish to correct w to c. To find 
e and c, it is helpful to consider cosets of the code C. 

This paragraph shows that there is a unique error associated with each coset. Let H be the 
set of cosets of C in Z". An error e e £ changes a code word c into e © c, an element of 
some coset of C. Given errors e\ e 2 and codewords c\ and o, by disjointness condition 
11 . 1 , e\ ®ci and <?2©C2 are in two different cosets. To see this, suppose e\ ©ci and e2©Q 
were in the same coset. Then there would exist a C3 e C such that e\ ©ci ©C3 = e2©C2- 


Box 11.2 

Cosets 


Given a subgroup H < G, for each a e G, the set aH = [ah\h e H j is called a (left) coset of H 
in G. (Right cosets are analogously defined, but we do not need to consider them here, so we will 
simply refer to left cosets as cosets.) 

For a and b in G, either aH = bH or aH C\bH = 0, so the cosets partition G. Thus, the order 
of a subgroup must divide the order of the group and, similarly, the number of distinct cosets must 
divide the order of the group. The index of H in G is the number of distinct cosets of H in G. and is 
denoted by [G : H], 

For example, let G = Z„, and let H = mZ n be the set of multiples of m for some integer m 
dividing n. The order of G is n, the order of H is n/m, and the number of distinct cosets is [G : H] = 
\G\/\H\ = m. 

If K < H <G, then [G : K] = |G|/|Aj = (|G|/|i/|)(|tf|/|Aj) = [G : H][H : K], 
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But ci ©C 3 is in C, which violates the disjointness condition 11.1 that says that two distinct 
correctable errors cannot take two codewords to the same word. Thus, knowing to which 
coset the word e © c belongs, tells us which error e has occurred. Let us make this more 
precise. 

Because Z" is Abelian, the set of all cosets forms a group, H. It is of size 2 n ~ k . Since H 
is Abelian and nontrivial, and all elements of H have order 2, H is isomorphic to 7J‘^ k . Let 
a : H —»■ Z"" k be an isomorphism. The map 

h : 7\ Z n ~ k = H 

w i-> a (w ffi C) 

sends all elements of C to the zero element of Z" k ; the kernel of h is C. The element h(w) 
characterizes each coset since h(w ) = h(w') if and only if w and w' are in the same coset. By 
the previous paragraph, there is a unique error e e £ associated with this coset. Since h(w) 
characterizes the coset, it also characterizes this error. For this reason, h(w ) is called the error 
syndrome, or simply syndrome. 

More concretely, h can be realized by an (n — k) x n matrix P. To construct a concrete P, find 
n — k linearly independent elements of Z" such that p, ■ c = 0mod2 for all c e C, and take 
these as the rows of the matrix: 


P 


t Pi \ 


V Pn-k / 


For a given code C, there are many possible matrices P (just as there are many possible iso¬ 
morphisms a). The matrix P, acting on w e Z" viewed as a column vector, produces an n — k 
length binary column vector Put, the syndrome, that characterizes the coset of C containing w. 
Each of these n — k values is the inner product (mod 2) of w with a row of P. For this reason, 
the rows p, are called parity checks and P is called a parity check matrix for code C. The parity 
check matrix P distinguishes between distinct correctable errors e ( - and e, since P (e,) ^ Piej). 
If G is a generator matrix for the codewords of C, and P is an arbitrary (n — k) x n matrix, the 
(n — k) x k product matrix PG is 0 if and only if P is a parity check matrix for C. The code C is 
both the image of Z k in Z" under G, and the kernel of P, the set of elements of Z'j sent to 00 • • • 0 
under P. 

Flamming codes are among the simplest classical codes and are used as the basis for many 
quantum codes. There is a Hamming code C„ for every integer n > 2. A parity check matrix for 
Hamming codes has columns consisting of all the non-zero n-bit strings. Since the parity check 
matrix for the Hamming code C„ is a n x (2" — 1) matrix, the generator matrix for C„ is therefore 
a (2" — 1) x (2" — n — 1) matrix, and the Hamming code C„ is a [2" — 1, 2" — n — 1] code. All 
Hamming codes correct single bit-flip errors. 
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Example 11.2.7 The Hamming code Ci- The [3,1] repetition code is also the Hamming code C 2 , 
the code with parity check matrix 


P = 


0 1 1 \ 

1 0 1 ) ' 


A different parity check matrix for the same code is 


P' = 


1 1 0 \ 

1 0 1 ) ' 


The matrix P' has form (A|7). By exercise 11.2, (T) is a generator matrix for the code. The 
generator matrix obtained in this way from P' is 



The code C 2 is called a repetition code, since 0 1 —> 000 and 1 111. 


Example 11.2.8 The Hamming code C 3 . The Hamming code C 3 is a [7, 4] code. Section 11.3.3 
uses C 3 to define the quantum Steane code. 

A parity check matrix for the [7, 4] Hamming code is 

/ 0 0 0 1 1 1 1 \ 

P' = ( 0 1 1 0 0 1 1 ] ; 

\ 1 0 1 0 1 0 1 / 

its columns are exactly the seven non-zero 3-bit strings. Our next task is to find a generator matrix 
G' for C. Since each row of P' contains an even number of Is, each row is orthogonal to itself. 
Furthermore, these elements are orthogonal to each other, that is, PP T = 0, so we may take as 
the first three columns of G' the transposes of the rows of P'. We need to find one other vector 
orthogonal to and linearly independent of these columns. The vector 

( 1 1 1 1 1 1 1 ) T 

satisfies both conditions. So a generator matrix for the [7, 4] Hamming code is 

/ 0 0 0 1 1 1 1 \ T 

, _ 0 110 0 11 

10 10 10 1 

V 1 1 1 1 1 1 1 / 
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Alternatively, the [7, 4] Hamming code can be defined in terms of a more convenient parity 
check matrix of the form (A|7), 

1110 10 0 
P = | 1 1 0 1 0 1 0 

10 110 0 1 


By exercise 11.2, 

is (i), 


a generator matrix corresponding to a parity check matrix of this form 


/ 


G = 


0 \ 
0 
0 
1 
0 
1 
1 


11.2.6 Diagnosing and Correcting Errors Using Quantum Codes 

This section describes a procedure for correcting errors handled by nondegenerate quantum codes. 
Let C be an [[«, k]] quantum code that is nondegenerate with respect to a correctable error set 
£ = {7s,}, where 0 < i < M. Suppose |w} = £ s |u) for some E s e £ and |i>) e C. Because C 
is nondegenerate with respect to £, the subspaces EjC and EjC are orthogonal for all i ^ j, 
so E s | v) is the only way to obtain u>) from a codeword in C and an error in £: the elements 
E s and i>) are unique. Thus, if we can determine in which subspace E S C the state |u;} lives, 
from among the M subspaces {7s,C}, we can correct the error by applying e] to \w). To make 
this determination, we must measure the state |u>}. The standard model of quantum computation 
allows only single-qubit measurements in the standard basis. Any other measurement can be 
carried out by computing into ancilla qubits and measuring each of these in the standard basis, 
but only some measurements can be efficiently carried out in this way. This section presents a 
general framework. Later sections of this chapter and the next consider implementation issues 
with respect to specific codes. 

The aim of the measurement is to determine in which error subspace the state w>) lies. Let 
W, = EiC, and 


M— 1 

w= 0W/. 

;=o 

Let W 1 - be the possibly empty subspace of the computational space V orthogonal to W ; vectors 
in IT 1 are orthogonal to all codewords and also to all states that are images of codewords under 
a correctable error Ej e £. For notational convenience, define VT W = W L . Since | w) is the 
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result of an error E ,■ applied to a codeword, by definition of Wm , \w) does not lie in W M . 
Since the W) are mutually orthogonal, there is an observable O with eigensubspaces exactly 
the Wi. 

Let Pj be the projector onto the subspace VV',-. Let m = riog 2 M"| , and let Up be a unitary 
operator on n + m qubits such that 


M -1 


Up : |w}|0> i-> 

7=0 


( 11 . 6 ) 


where |w) = Y^=o bflwfl I s written in terms of its components bj\Wj) — Pj\w). Measuring 
the m auxiliary qubits in the standard basis gives the error syndrome, the subspace index j. By 
definition of Wm, the index M cannot occur. After measurement, the state of the first n qubits is 
in the subspace Wj — EjC, so applying the operator E 1 - corrects the error. The operator Up is 
called a syndrome extraction operator since it plays a similar role to the syndrome in classical 
error correction. The notation Up is meant to suggest a unitary operator that plays the role of 
the parity check matrix P in classical error correction. Since the labels for the subspaces can be 
arbitrarily chosen, many different unitary operators can serve as a syndrome extraction operator 
for a given code C and error set £. 

Measuring a single qubit / of the m auxiliary qubits on its own corresponds to a binary observable 
with two 2" _1 -dimensional eigensubspaces, the subspace spanned by all of the W, for which the 
/th bit of the binary representation of its index i is 0, and the subspace spanned by all of the W, 
for which the /th bit of the binary representation of its index i is 1. In this way, the syndrome 
extraction operator can be viewed as a set of m observables. 


Example 11.2.9 The bit-flip code revisited. Consider the bit-flip code C and the set of correctable 
errors £ = { E,j } with 

E oo = I ® / 0 /, E 0 i = X 0 I ® I, Eio = I ® X ® I, E n = I <81 ® X. 

More simply. Ego = /, £oi = AT, £jo = X \, and E\\ — Xg, where X ,• is the operator X applied 
to the i ,h qubit. The orthogonal subspaces corresponding to this error set are VLoo = EqqC , Wqi — 
EqiC, W w = E 10 C and W n = E U C with bases B 00 = (|000), |111)}, B 0 1 = {| 100), |011)}, 
Big — {|010), 1101)} and B\\ = {|001), 1110)}, respectively. The operator 

Up : \x 2 , x\, xo, 0, 0) —»■ \x 2 , xi, vo, b\ — x\ ©xo, bo — x 2 ©xo) 

serves as a syndrome extraction operator for C with error set £. Measuring bit b\ in the standard 
basis distinguishes between the eigenspaces spanned by subspaces {VLoo, Woi} and {VLio, Wji} 
respectively. Similarly, measuring bo distinguishes between errors in the spaces spanned by 
{Woo, Wio} and {Woi, Wn}. Measuring b\ and bg as i and j projects the state into Wjj = E,jC. 



266 


11 Quantum Error Correction 


The error can be corrected by applying E\-. If, for example, measuring the ancilla b\ and bo yields 
0 and 1 respectively, we apply the transformation X 2 . 

Measuring bo (resp. h \) directly, without the use of ancilla bits, can be done using the observable 
Z ® I <g> Z (resp. / ® Z <g> Z). Compare the classical parity check matrix 

'-(it!) 

for the [3,1] code with the array 
Z 1 Z 

1 z z ’ 


where the factors of the two observables have been placed in the rows. In the classical case, the 
parity check matrix multiplied by a word will be 0 if the word is a codeword. At least one of 
the rows of the parity check matrix when multiplied by a non-codeword will be non-zero. In the 
quantum case a codeword is in the +1-eigenspace for all the observables, and non-codewords 
are in the — 1-eigenspace for at least one of the observables. The stabilizer codes of section 11.4 
exploit this connection. 


Before turning to another example, we use this example to illustrate an alternative to the syn¬ 
drome measurement. The use of ancilla qubits in quantum error correction to correct general 
errors is one of the most elegant and surprising aspects of quantum computing. By computing 
information into ancilla qubits and measuring them, nonunitary errors can be converted to unitary 
errors. When the result of the measurement tells us which unitary error remains, we can correct 
it by applying the inverse unitary operator. Alternatively, and equivalently, instead of measuring 
the ancilla after computing into them, a controlled operation from the ancilla qubits to the compu¬ 
tational qubits can correct the error. In general, for £ = { E s ], Instead of measuring after applying 
Up to the computational system and the ancilla, apply the following controlled operation with 
the ancilla as the control bits: 

Vp = Y] E l ® l s >( s l- 

S 

In this way, errors can be corrected without measurement. 


Example 11.2.10 Bit-flip code Cbf correction by controlled operations. After applying Ubf 
in example 11.1.2, instead of measuring, a controlled operation Vp from the ancilla qubits to 
the computational qubits can be performed, one that applies each of the three error correction 
transformations when the ancilla qubits are in the corresponding state: 


V P = /®|00)(00|+X 2 ®|01)(01|+X 1 ®|10>(10| + Xo®|11>(11|. 
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The circuit for this controlled operation is 

|* 2 ) -¥- 

|*l) - h y- -■ -‘* 3^ — ... - 

|*o)-¥- 

|«l) -e-e-n- 

r/o) -e-•-e- 

Suppose an error E = < 1 X 2 + /3X\ has occurred. Applying this circuit to the state 
U bf (E\0)®\00)) =a|100)|U> + /3|010)|10> 
results in 

a; 1000) 111) + /6|000) 110) = 1000} ( 0 ; 111) +/1|10». 

We may wish to measure the two ancilla qubits in order to transform them to |00) so that they 
can be reused in a later error correction step, but measurement is not required to achieve quantum 
error correction. 




~¥ 


Example 11.2.11 The phase-flip code revisited. Recall that the relative phase code of section 
11.1.2 is dual to the bit-flip code of example 11.2.9 through the transformation W — H <g> H ® 
El. Applying W to all states and replacing all transformations T used in example 11.2.9 with 
WTW results in an error correction procedure for the relative phase code. Since X — H Z I!. the 
observables corresponding to the syndrome operator U' P are X <g> / <g> X and I ® X <g> X, which 
have corresponding array 

XIX 
I X X ’ 

which is related to the classical parity check matrix 

In this case, errors can be corrected without measurement using 
V'p = 7® |00}<00| + Z 2 ® |01}<01| + Z 1 (g)|10>(10| + Zo(g)|ll}(ll|. 
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11.2.7 Quantum Error Correction Across Multiple Blocks 

Just as the classical [n, k] block codes of section 11.2.1 encode length mk bit-strings as a length 
mn bit-string by encoding each of the m blocks of k bits, a quantum [[«, £]] code C encodes mk 
logical qubits in mn computational qubits by encoding each of the m blocks of k logical bits using 
C. A logical superposition such as 

\f) = EE Q ' , '(I W '>® K» 

i j 

is encoded as 

\f) = EE a d(l c <)® l c f))> 

i j 

where |c, } = Uc\u>j) and Uq is an encoding function for C. Quantum block codes must be able to 
correct errors on such superpositions. Furthermore, if C can correct errors Ej e £, then C applied 
blockwise must be able to correct errors of the form ® • • • ® E hn on the encoded state. The 
rest of this section illustrates, in the two-block case, quantum error correction on superpositions 
and across multiple blocks. 

Suppose the encoded state |i Jr) = ay (|c,-> ® |c ; }) were subject to error E a <g> E b , where 

E a and E h are both correctable errors for code C. Applying the syndrome extraction operator Up 
for C to each block separately, measuring the ancilla for each block, and applying the appropriate 
correcting operators will restore the state |i fr): 

U P ® U P «E a ® E b \f)) ® |0>|0» = ^ay(£/p(£ a |c,-)|0» 0 (I/ P (E*|c,->|0») 

ij 

= Uij{E a \Ci)\a)®E b \Cj)\b }), 

ij 

where we have reordered the qubits for clarity. Measurement of the two ancilla yields |o) and | b) 
respectively, with the computation qubits in state \<p) — - a,j (E a \ci) ® E b \cj)). The syndrome 

\a)\b) indicates that the error can be corrected by applying e\ ® e\. Applying E' a <g> Ej } does 
indeed correct the error: 

El ® El\4) = E E “y( \ c <) ® I Cj)) = W). 

‘ j 

11.2.8 Computing on Encoded Quantum States 

For error correcting codes to be useful for quantum computation, we must still be able to perform 
computation on the states after they are encoded. Let C C V be an [[n, £]] quantum code and let 
U c be an encoding function U (: : W C. In order to perform general computation on encoded 
states, for any unitary operator U : W -> W, we must find an analogous unitary operator U 
acting on the encoded states, one that for all |u>) e W sends Uc( |uj)) to Uc(U\w)). Because 
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we do not care how U behaves outside C, there are many unitary operators acting on V that 
have this property. For a given unitary operator U , there are many ways to implement it in terms 
of basic gates. Furthermore, for a given U : W —> W with two logical analogs U : V —> V and 
O' : V -> V, one of 0 and O' may be more efficiently implementable than the other, and some 
implementations have better robustness properties than others. 

One such operator can be constructed using the encoding operator. Let Uq be the unitary 
coding function that sends |w> <g> |0} to |u>). The transformation U^ sends a valid codeword |u)'} 
to | w') <g> |0}. The operator 0 — Uc(U ® /)[/^ acts as desired on the code space; 0 is the logical 
equivalent to U on the encoded states. In general, however, this construction yields a 0 with poor 
robustness properties: after applying U r , the state is unencoded, making it extremely vulnerable 
to any errors that occur during this time. Chapter 12 takes a careful look at how logical operations 
are best implemented on encoded states. 

11.2.9 Superpositions and Mixtures of Correctable Errors Are Correctable 

Section 10.4 showed that general errors E can be modeled as probabilistic mixtures of linear 
transformations, and that these linear error transformations A, are not necessarily unitary: 

K 

E : pi-* Y^AipA*. 

i=i 

This section shows that errors that are non-zero complex linear combinations of elements of a 
correctable error set £ for a code C can be corrected by this code. The term set of correctable 
errors refers to a set of errors the code can correct via a unitary transformation, but the set 
of errors the code corrects is much larger: all linear combinations of such errors. Measurement 
is used to project a linear combination of errors onto one of the correctable errors, and it is also 
used to detect which error remains after measurement so that the corresponding unitary error 
transformation can be applied. As in the classical case, there are many possible maximal sets of 
correctable errors for a given code, and some of these distinct maximal sets of correctable errors 
generate distinct subspaces. 

Let error E — Jf'lLo a i Ei be a probabilistic mixture of errors, a linear combination of errors 
E, from a correctable set £ such that |a,j 2 = 1. The error E may or may not be unitary, so 
we consider the general case and show that, if E takes a codeword |c), with density operator 
p = |c) (c|, to a mixed state p’ = EpE\ we can correct for the error. The mixed state p’ can be 
written 

p' = X>/l 2 £/k)(c|£j. 

Since the E l | c) are mutually orthogonal and | o: ; -1 2 = 1, p' has trace 1 and is a mixed state. 
Thus p' is a probability distribution over the orthogonal pure states E, |c). Consider the observable 
O — Xj P h where the A, are distinct and F, is the projector onto the subspace E, C. Using the 
definitions in section 10.3, measurement with O results in the state Pip'— E, \c) (c\ E- with 
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probability |a, | . Thus, after measurement, we have a pure state li-, |c). The measurement result, 
a,, tells us in which subspace E,C the state resides. Applying E] corrects the state. 

11.2.10 The Classical Independent Error Model 

In both the quantum and classical case, a general error correction strategy consists of three 
parts: detecting non-codewords, determining the most likely error, and applying a transformation 
to correct that error. Determining the most likely error requires an error model. A common 
family of error models for classical computation is the independent error model in which each 
bit has probability p < 1/2 of flipping. In this model, the chance of any of the single bit-flip 
errors 100 • • • 0, 010 • • • 0, ..., 000 • ■ • 1 is p( 1 — p) n ~ l , the chance of the two-bit error 110 ■ • • 0 
occurring is probability p 2 ( 1 — p)" 2 , and the chance of no error occurring is (1 — 

This error model guides the error correction strategy. Since under this model no error is more 
likely than any error, if a codeword is received, our best bet is to assume that no error occurred. 
Suppose w is a non-codeword we wish to correct. Let c be an element of C that is closest 
to w in the Hamming distance. If the closest element is unique, the most likely error to have 
occurred under the independent error model is e — cffiu;. Let id be another element of the 
coset containing w, so w' — w ® k for some k e C. If c is the closest element in C to w, then 
c' = c © k must be the closest element in C to id. The most likely error resulting in id is also e, 
because 

w' © c' = w ®k® c®k = w © c = e. 

Thus, all elements of a coset are equally close to C in the Hamming distance. By definition of c, 
the most likely error e is the element of the coset with the lowest Hamming weight. 

Once the syndrome computation tells us the coset, we correct by applying the lowest weight 
element e of that coset. If the actual error was a different one, we have “corrected" to the wrong 
word, but no better strategy exists. In particular, if we receive a codeword, we do nothing. In gen¬ 
eral, error correcting codes cannot correct errors that take codewords to codewords. Furthermore, 
if there is more than one closest element to w in C, it is unclear how best to correct the error. 
For this reason, when working under the independent error model, the set of correctable errors 
is usually taken to be £ r , the set of all words of Hamming weight t or less, where t is as large 
as possible without introducing ambiguity or, equivalently, violating the disjointness condition 
(equation 11.1) for a set of correctable errors. 

The minimum Hamming distance between any pair of codewords is called the distance of 
the code. An [n. k, d] code is one that uses 77 -bit words to encode /.-hit message words and has 
distance d. For each codeword c, let 

e t (c ) = {v\d H (v, c) < t} 

be the set of words no more than Hamming distance t away from c. The set e,(c) contains exactly 
words v obtained from c by an error of weight at most t. If the sets e,(c) are disjoint for all pairs 
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of code words c and c', the code can correct any weight t error by mapping words in e, (c) to 
the codeword c. The sets e t (c ) are disjoint if and only if d > 2f + 1. So an [n, k, <i]-code can 
correct at most all errors with weight less than or equal to t — L^p-J ■ For a distance d code C, 
the maximum possible t satisfies 2t + 1 < d, since otherwise two codewords could be mapped 
to the same error word under two different f-bit errors, and the disjointness condition would 
not hold. 

11.2.11 Quantum Independent Error Models 

For a given quantum code C, some sets of correctable errors for C are better than others from a 
practical point of view. As in the classical case, which correctable sets are better depends on which 
errors are more probable. Because there is a richer class of quantum errors, there is a greater variety 
of quantum error models to choose from. The most common quantum error models assume, as 
the classical independent error model does, that errors on separate qubits occur independently 
and that, with probability p, a given qubit is subject to an error. The error model we describe is 
motivated by the local and Markov assumptions discussed in section 10.4.4. 

Because unitary errors are easily corrected by applying the inverse transformation, sets of 
correctable errors are chosen to contain only unitary error transformations. It is particularly 
common to choose a correctable set of errors containing only elements of the generalized Pauli 
group Q n . The generalized Pauli group Q n consists of n-fold tensor products of Pauli group 
elements: all elements of Q n are of the form 

p.A\ ® A 2 ® • • • <S> A n 

where A; e {/, X , Y. Z} and p. e {1, — 1, i, —i}. The commutation relations, the relations between 
group products gigj and gjgi, for the Pauli group imply that every element of Q n can be written 
as 

p(X ai <g> • • • ® X a ")(Z b1 <g> • • • <g> Z b "), 
where the a, and /;, are binary values. 

Section 10.4.4 showed that any error can be expressed as a mixture of linear transformations 
Aj 

yftr(AiPAj) 

The generalized Pauli group Q n forms a basis for the vector space of linear transformations acting 
on the vector space associated with an n -qubit system. Thus, a general error E on an n -qubit 
quantum register can be expressed as linear combination e jEj where Ej e Q n . All linear 
transformations arising in an operator sum decomposition can be written not only in terms of 
unitary operators, but also in terms of generalized Pauli operators. By results of section 11.2.9, a 
mixture of errors, each of which is corrected by a procedure, is also corrected by that procedure. 
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We write A - ,- for the transform that applies X to the /th qubit and leaves the others alone: 


/ 0 ■ ■ ■ 0 / ®X <8> / <8> • • • <8> / . 

i n—i—l 

The meaning of K; and Z, is similar. The weight of a Pauli error is the number of nonidentity 
terms in its tensor product expression. The weight of an error is defined only for Pauli errors, not 
for general errors. 

The generalized Pauli group has a number of convenient properties. For example, the stabilizer 
codes of section 11.4 make heavy use of the fact that any two elements g i and gi in the Pauli group 
either commute (gig 2 = g 2 gi) or anticommute (g\g 2 — —g 2 gi)- Another convenient property is 
that if the set of all single-qubit bit-flip and phase-flip errors X, and Z, for all i is a correctable 
set £ for a code C, then £ can be expanded to contain the Y, errors for all i. The orthogonality 
condition for £ and C says that if X, and Z, are correctable errors, then for all i, the following 
four expressions are zero: 

(ci\x]Zi\c 2 ) = {ci\zj Xi\ C 2 ) = {ci|/Z,|c 2 > = (ci|/X,|c 2 ) = 0. 

To show that the Y] are compatible correctable errors, it suffices to show that for all i and j and 
for all orthonormal |ci) ^ |C 2 ) e C, 

(ci|x)y i |c 2 > = 0 

(ci|Z]F i |c 2 > =0 

{Cl\IY,\C2) = 0, 
and for all j ^ i 
{ci\Y]Y i \c 2 )=Q. 

These equalities follow immediately from multiplication in the Pauli group. For example, 
because 

X\ Y, = -XjXfZi - -IZi, 

(ci|X i t y i |c 2 } = -(ci\IZi\c 2 ) =0 

Thus, any code that corrects all bit-flip errors X and all phase-flip errors Z also corrects all Y 
errors. 

Let t be the maximum weight for which the set of Pauli group elements of weight t or less 
satisfies correctable error set condition (equation. 11.2). Any nondegenerate [[«, k]]-quantum code 
cannot correct errors of more than weight t. Section 11.2.4 showed that the maximum number 
of elements in a correctable set for a nondegenerate code is 2"~ <; . The number of elements of 

weight t is 3' ^ ” j. Thus, any nondegenerate code that corrects all errors with weight t or less 
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must satisfy the quantum Hamming bound 



< 2 n ~ k . 


A nondegenerate code that obtains equality in the quantum Hamming bound is called a perfect 
code. The classical Hamming Bound is discussed in box 11.3. The quantum Hamming bound does 
not apply to degenerate codes. All classical codes satisfy the classical Hamming bound. That the 
quantum Hamming bound does not apply to all codes provides an example of how the existence 
of degenerate codes complicates the quantum picture. 

Just as in the classical case, the term perfect should not be taken to imply that perfect codes 
are necessarily the best ones to use in practice. The quantum Hamming bound quantifies the best 
trade-off in terms of code expansion (ratio of size of encoded state to original message state) 
and the strength of the error correction in terms of the number of single-qubit errors the code 
can correct. A third quantity is also of great practical interest: the efficiency with which errors 
can be detected. There are many codes that come close to the quantum Hamming bound but 
that do not have efficient error detection schemes, as measured in terms of the number of gates 
needed for syndrome extraction and the number of qubits that need to be measured. Both in the 
quantum and the classical cases, significant structure must be in place in order for efficient error 
detection schemes to be possible. The design of classical, as well as quantum, error correction 
schemes with efficient error detection and good trade-offs between data expansion and strength 
is a continuing area of research. Stabilizer codes provide this structure; for this reason, nearly all 


Box 11.3 

The Classical Hamming Bound 


Fo, any [,. C«]-node, .here are weigh., errors, so ,he cardinah.y E, (c, is 

! £ >< c >I-E( “ )■ 

1=0 v 7 

Since there are 2 k codewords, the sets Et (c) can be disjoint only if | Et (c) \ 2 k < 2". Thus, any [n, k] 
code that corrects all errors of weight t or less must satisfy the following bound: 



This condition is called the (classical) Hamming bound. A code for which equality holds is called a 
perfect code , since it uses the minimum size n to encode fc-bit message words in such a way that all 
weight t errors can be corrected. This bound on t is independent of d. 
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quantum error correction codes are stabilizer codes. CSS codes, a subset of stabilizer codes, have 
the advantage that they can be built from pairs of classical codes that are related to each other in a 
special way. 

11.3 CSS Codes 

Shor’s code encodes a single qubit into three qubits to correct bit-flip errors and then re-encodes the 
resulting logical qubits to correct phase-flip errors. Recall from section 11.1.2 that Xj = // Z, H , 
so bit-flip errors Xj are closely related to phase-flip errors Z, ; bit-flip errors in the standard basis 
{|0>, 11}} are phase-flip errors in the Hadamard basis {|+>, |—>} and vice versa. Calderbank and 
Shor, and separately Steane, recognized that by using this relation they could construct quantum 
codes from pairs of classical codes that satisfy a certain duality relation. These codes, called CSS 
codes after their founders, have a number of advantages. For example, by encoding only once to 
correct both phase- and bit-flip errors, the number of qubits required to correct t qubit errors can 
be reduced: the most famous CSS code, Steane’s [[7, 1]] code requires seven qubits to correct all 
single qubits, as opposed to the nine qubits needed in Shor’s code. 

11.3.1 Dual Classical Codes 

Two classical codes C\ and Ci are dual to each other, C\ — C/, if a generator matrix for one is 
the transpose of a parity check matrix for the other: G i = Pj. Two sets of words V and W are 
said to be orthogonal if for all u e V and we IT, the inner product, v T w = 0 mod 2, where i> and 
w are viewed as vectors. Let C and C x be dual codes with generator matrices and parity check 
matrices [G, P ] and { G x , P x }, respectively. The codewords C 1 are orthogonal to the codewords 
of C because G x = P T \ v e C x and w e C means that there exist x and y such that v = G ± x 
and w — Gy, so 

u f u) = ( G-'-xfGy = (P T x) T Gy = x T PGy = 0. 


Example 11.3.1 The dual code to the [7,4] Hamming code is the is the [7, 3] code C 1 - with 
generator matrix 


G x = P T 


1 1 1 0 1 0 0 \ 
1 1 0 1 0 1 0 ) 
10 1 10 0 1 / 


and parity check matrix 


P x = G t 


/ 1 1 1 0 1 0 0 \ 

110 10 10 

10 110 0 1 

\ 1 1 1 1 1 1 1 / 
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Since the rows of P are a subset of those of P L , it follows that C contains its own dual: C L C C. 
The eight codewords of C L are the linear combinations of the columns of G x : 

C 1 = {0000000, 1110100, 1101010,0011110, 1011001,0101101,0110011, 1000111}. 

The sixteen codewords of C are those of C 1 - plus those obtained by adding 1111111 to all of the 
codewords of C 1 . 


For any [n, k] classical code C, 


£(-ir 

C€C 


2 k if x e C 1 
0 otherwise. 


This identity may be established by relating it to the identity 


N -1 


E<-i> 


y-x 


0 for x ^ 0 
N = 2" for x = 0 


(11.7) 


from box 7.1. Because x ■ Gy — G r x ■ y, the inner product of the two /7-bit strings x and Gy is 
equal to the inner product of the two k -bit strings G r x and y, so 

2 a '-1 

£(-1)™= ^(-1) G3,JC 

ceC y =0 

2 k -l 

= E(-D^ 

y=0 

_ | 2 k if G T x = 0 
I 0 otherwise 


Identity 11.7 follows, since G T x — P ± x = 0 precisely when x e C -1 . 


11.3.2 Construction of CSS Codes from Classical Codes Satisfying a Duality Condition 

Identity 11.7 enables the construction of states that are superpositions of codewords from a 
classical code C when viewed in the standard basis and are superpositions of dual codewords 
w e C 1 when viewed in the Hadamard basis. More precisely, we construct states \\jr g ) that are 
superpositions of codewords from C, and show that they have amplitude only in the states | hi) 
where i e C A and the | hi) are elements of the n-qubit Hadamard basis: 

| hi) = W\i) = H®---®H\i). 

After constructing these states, this section shows how this property enables the correction of 
both phase-flip and bit-flip errors. 
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Box 11.4 

Quotient Groups 


If H < G and the conjugacy condition, gHg 1 = H, holds for all elements of g e G, then the 
cosets of H form a group, called a quotient group G/H of the group G. Let us be more precise. Let 
S = g\H and T = gjH be cosets of H in G. Then ST = gi HgiH = gig2H gj l 82 H — g\glH is 
another coset R = g\giH of H in G. 

Let f : G -*■ H be a homomorphism. Let K be the set of elements of G that are sent to the identity 
in H. Then. K is a subgroup of G that satisfies the conjugacy condition, so the cosets of K form a 
quotient group G/K. If / is onto, the quotient group G/K is isomorphic to H. 


Let Ci and C 2 be [n,k{] and [n, k{\ classical codes respectively, and suppose both codes 
correct t errors. Furthermore, suppose C Ci. There are 2 kl ~ k2 distinct cosets of Cy in C 1 ; 
every c e C 1 defines a coset c © C^ - = {c © c'\c' e C^} and c © C^~ — d © if and only if 
c®d e Ct . The set of cosets forms a group, the quotient group G = C\ j. Since Ci = Z* 
and C 2 = Z* 2 , the quotient group G = zt}~ k2 . For each element g e G, define a quantum state 

!**> = -/=£ E l c * 0c >’ 

ceC^ 

where c g is any element of Ci contained in the coset of CV' labeled by g. The 2 kl ~ kl -dimensional 
subspace spanned by the \\jf g ) for all g e G defines a [[«, k\ — & 2 ]] quantum code C, the CSS 
code CSS(Ci, C 2 ). 

This paragraph shows that IV'#), when viewed in the Fladamard basis, only has amplitude in 
the codewords of C 2 . The components of | \fr g ) in the Fladamard basis are 


{hi\f g )\K) = {i\W\f g )W\i). 


Therefore it suffices to show that W\ir g ) is a superposition of codewords |c> e C 2 : recall from 
section 7.1.1 that 

l N ~ l 

W\y) = — J2(-V y ' X W- 

x—0 


So 


V2 C € C 2 J - V2 ^=0 


I N ~ l 

Y(-ir Cs 

H+t-T 4- < L - 1 


s/2 n+k2 


x=0 


ceC. 
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1 

V 2 n+k 2 


(-ir c *(2 k2 )\x) 

X€(,C^)X 


1 

V 2 n ~ k 2 


xeC 2 


where line 3 follows from line 2 by identity 11.7. 

Now we turn to how the error correction is carried out. Since each | \// g ) is a linear combination 
of codewords in C\, a quantum version of the syndrome for C\ can be used to correct all t 
bit-flip errors. More specifically, each row of the parity check matrix Pi tests whether the sum 
(mod 2) of the corresponding set of bits is even or odd. If a row of the parity check matrix 
reads b = b„_\b 2 ... b\, then the observable that makes the analogous check on quantum states 
is Z bn ~ 1 ® ® Z h ', the operator with a Z in every place the parity check has a 1 and an I every 

place the parity check has a 0. More generally, for any single-qubit unitary transformation Q, let 
Q h be the tensor product Q h " ' <%>•••<% Q hl ® Q h ° . Let b e P mean that b appears as a row in P. 
To realize these observables in terms of single-qubit measurement, each row b e P\ corresponds 
to a component of a quantum circuit on n + 1 qubits, the n computational qubits plus an ancilla 
qubit. The component has a C„ ot between the i th qubit and the ancilla wherever the i th entry of 
the row has a 1. 

To see how the code handles phase errors, we first confirm that phase-flip errors become bit-flip 
errors under W. Let e be the bit string indicating the location of the phase-flip errors. Under this 
error, |i fr g ) becomes 


1 

VW2 


£ ( -i r ( e g ®O| Cg0c) 


which, after applying W , becomes 


N-\ 


V 2' 1+k 2 


(-d < ’ L s ® C ) y^(_i) 


^•(Cg®c) I 


cgC; 


x=0 


l 

■s/2 l,+k 2 


N -1 


E(-d 


(e(Bx)-Cg 


x=0 




(e®x)-ci 


ceC 


_L 

2 


1 

V2 ,! “L> 


Y (_i)(«e»K*| x ) 

x(BeeC2 


l 

■J2 n ~ k 2 


Y; (—l) ;v ' e ^|y ©e). 
yeC2 


This state differs from W\i// g ) by exactly the bit-flip error corresponding to the string e. 
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Since applying W to | i/r g } yields a linear combination of elements of C 2 , and under W phase-flip 
errors become bit-flip errors, a quantum version of the syndrome for C 2 can be used to correct 
phase errors. Following the construction for bit-flip error detection, for each row in the parity 
matrix P 2 , construct a component of a quantum circuit that has a C not operator between qubit i 
and the ancilla qubit if and only if there is a 1 in the i ,h entry of the row. 

A variant of this construction gives an even more direct connection between the quantum 
syndrome computation and the parity check matrices. The following two circuits, involving 
a computational qubit and an ancilla, have the same effect on states |i/f}|0} where |i jr) is any 
single-qubit state. 



Measuring the ancilla in either case results in the same state with the same probability. So instead 
of applying the Walsh-Hadamard transformation to the computational qubits and then C no , from 
computational qubits to the ancilla, simply apply a Hadamard gate to the ancilla qubit and use it 
to control phase flips on the computational qubits. A similar argument implies that to correct bit 
flips we can apply a Hadamard transformation to the ancilla and use it to control bit flips on the 
computational qubits. 

Studying stabilizer codes, a generalization of CSS codes, will illuminate why correct compu¬ 
tational states remain undisturbed by the syndrome computation. Altogether, the CSS code has 

n — k\ + n — k 2 = 2 n — k\ — ki 

observables, n — k\ that contain only Z and I terms and correct bit-flip errors, and n — A :2 that 
contain only X and I terms and correct phase-flip errors. Instead of constructing CSS codes start¬ 
ing with superpositions of classical codewords, we could have begun the construction with the 
observables corresponding to the parity check matrices for the codes C\ and C 2 . This approach 
will be pursued in section 11.4 on stabilizer codes. 

11.3.3 The Steane Code 

Steane’s [[7, 1]] code C is based on the [7, 4] Hamming code. We revisit this code multiple times, 
first in section 11.4 as an example of stabilizer codes, and then in chapter 12 as the running 
example illustrating the design of fault-tolerant procedures. 









11.3 CSS Codes 


279 


Recall from example 11.3.1 that 

C 1 = {0000000, 1110100, 1101010, 0011110, 1011001,0101101,0110011, 1000111}, 

and that C contains sixteen codewords, those of C L plus those obtained by adding 1111111 to all 
of the codewords of C x . Since C contains its own dual, the conditions for the CSS construction 
are satisfied by taking C\ = C and Ci = C. Following the CSS construction, 


10} |0) = 




— ( 10000000 } + 11110100 } + 11101010 } + 10011110 }+ 
v 8 

11011001 } + 10101101 } + 10110011 } + 11000111 }) 


and 

|l>-» 11} = 


— T 

Z - 


|c> 


ceC,c$C L 


— ( 11111111 } + 10001011 } + 10010101 } + 11100001 }+ 

v 8 


10100110 } + 11010010 } + 11001100 } + 10111000 }). 


A syndrome extraction operator Up for the Steane code is based on a parity check matrix P for 
the [7, 4] Hamming code, 


P = 


/ 1 ! i 

1 1 0 
v 1 0 1 


0 

1 

1 


1 

0 

0 


0 0 \ 
1 0 
0 1 


The six observables for the Steane code are (a circuit for Si is shown in figure 11.1): 


51 = Z®Z®Z®/®Z®/®7 

5 2 = Z®Z®/®Z®/®Z®7 

53 = z®/®z<g>z®/®/<g>z 

( 11 . 8 ) 

s 4 = x<g>x<g>x®/®x<g>/<g>/ 
s 5 = x®x<g>/<g>x<g>/<g>x<g>/ 
s 6 = x<g>/®x<g>x®/<g>/<g>x 

We postpone discussion of how to compute on the encoded states until after developing stabilizer 
codes, a general class of codes that contains CSS codes. 
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Figure 11.1 

One of six component circuits for a syndrome extraction operator Up for the Steane code. 


11.4 Stabilizer Codes 

The stabilizer code construction generalizes from the construction of CSS codes described in 
section 11.3. The construction begins by recognizing that certain [[«, A:]] codes, 2 k -dimensional 
subspaces of a 2 "-dimensional space, can be defined in terms of the set of operators that stabilize 
the subspace. 


Example 11.4.1 The Steane code is stabilized by the six observables Si, .S?, S 3 , ,S 4 , S 5 , S f] of 
equations 11.8; the states |0) and 11} of the Steane code are+1 -eigenvectors of all six observables. 


Section 11.4.1 explains how codes are defined by their stabilizers. It looks at the case of binary 
observables serving as stabilizers for a code C. All of the observables used in the CSS code con¬ 
struction have only two eigenvalues, — 1 and +1. Section 11.4.1 uses properties of these observ¬ 
ables to determine conditions on a set of correctable errors for C. Section 11.4.2 further restricts 
from binary observables to elements of the generalized Pauli group, yielding more specific con¬ 
ditions on a set of correctable errors. This setup prepares for a full development of stabilizer code 
error correction in section 11.4.3. Section 11.4.4 explains how computation is done on the logical 
qubits of a stabilizer code using a new code, the [[5, 1]] stabilizer code, as a running example. 

11.4.1 Binary Observables for Quantum Error Correction 

A subspace IT of a vector space V is stabilized by an operator S : V —> V if for all |w} e W , 
>S|w} = |w>. In other words, W is stabilized by S if |w> is a +1-eigenstate of S for all |w> e W. 
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The stabilizer of a subspace W C V is the set of all operators that stabilize W. Let S be the set of 
all binary observables on V with only +1 and — 1 as eigenvalues. Any set of observables {.S',-1 C S 
defines a subspace C, the largest subspace stabilized by all elements .S',. Sometimes the code C is 
an attractive quantum error correcting code, while in other cases C is not. For example, for some 
sets of observables, C is simply the zero vector. Our next task is to understand for what sets of 
observables we get an interesting code by learning how to determine a correctable set of errors 
for C from the set of observables that defines C. 

Suppose S stabilizes v) and T anticommutes with S: in other words, ST — —TS. Then, 

S7» = -TS\v) = -7», 

so 7» is a —1-eigenvector for S. If a code C is stabilized by S , then for all | v) e C, the state 
T\v) cannot be a codeword of C. That T\v) is not a codeword can be detected by measurement 
with S. This fact enables us express to the condition on a set £ of unitary errors, equation 11.2 of 
section 11.2.4 

{c a \E) Ej\c b ) —mijS ab , (11.9) 

in terms of the set of stabilizers. Let C be the code defined defined by the r stabilizers S\,., S r . 
Suppose that for all pairs E, and E, of distinct elements, either e\ Ej stabilizes C or there is at 
least one Si that anticommutes with E] Ej . The next paragraph shows that such a £ is a correctable 
set of errors for C. 

Box 11.5 

Stabilizers and Groups Acting on Sets 


A group G acts on a set S if for all elements g, g[, g 2 € G and element s e S 

• it is meaningful to talk about applying g to 5 to obtain another element gs of 5, 

• the identity e of G takes any j- to itself, es = s, and 

• (<?l£2)s =g\(82s)- 

A group may act on a set in many different ways, so it is important to define which action is being 
talked about. We give some examples. 

• For any H < G, the group G acts on the set of cosets of H in a canonical way: an element gj e G 
acts on a coset g 2 H taking it to the coset (g\g 2 )E. 

• The group of unitary operators U : V —> V acts on the vector space V, viewed as a set, by sending 
|i>) e V to the vector U\v). 

For any s e S, the set of group elements that stabilize s is a subgroup, 

H s ={g G G\gs = j), 
called the stabilizer of s. 
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If Ej Ej stabilizes C, 

(C'a | E ( E j\Cb) — {c a \c b ) — $ab- 

On the other hand, suppose Ej Ej anticommutes with a stabilizer S/. Then 
{c a \El Ej\c b ) = {c a \E\ EjSi\c b ) = ~(c a \SiEj Ej\c b ) - ~{c a \E]Ej\c b ), 
from which it follows that 
(c a \EjEj\c b )=0. 

Since the £, are unitary, 

{c a \E’i Ej\Cb) = Sab 

for all i,a , and b. These equations show that £ satisfies the quantum error condition, equation 11.2. 
If, for some i and j, the transformation e] Ej stabilizes C, the code C is degenerate with respect 
to £. Otherwise, if for all i ^ j each e] Ej anticommutes with at least one S/. then the code C is 
nondegenerate. 

11.4.2 Pauli Observables for Quantum Error Correction 

The observations of section 11.4.1 suggest a general mechanism for constructing a code C with 
correctable error set £ from a set of operators satisfying certain relations. Because of the gener¬ 
alized Pauli group’s commutation relations, it is relatively easy to find sets of generalized Pauli 
operators satisfying these relations. Because K f — —Y, X ' — X and Z T = Z, any element of the 
generalized Pauli group Q„ that contains an even number of Y terms and arbitrarily many X and 
Z terms is Hermitian, and so can be viewed as an observable. 

Let S be an Abelian subgroup of Q„ that does not contain —I. All elements of the generalized 
Pauli group Q„ square to either ±7. Since S’ is a subgroup that does not contain — I, all elements 
of S square to 7 which means they can only have ± 1 as eigenvalues. Because S is an Abelian 
group in which all elements square to the identity, S must be isomorphic to Z* for some k. Let 
S\,..S r be generators for S. Let C be the subspace stabilized by S: 

C = {\v) eV\S a \v) = \v),VS a eS}. 

The next paragraph shows that C has dimension 2"~ r . 

Let Cj be the subspace stabilized by the first i stabilizers: 

C/ = {|u>e V|S» = |u>, VO <./</}. 

Because all nonidentity elements S a of Q n have trace 0, and +1 and — 1 are the only eigenvalues, 
the +1 eigenspace of S a must have half the dimension of V. Thus, the subspace Ci stabilized 
by Si must have dimension half that of V: the subspace Ci has dimension 2" 1 . For all i, 
the operator P , = \ d + S,) is a projector onto the +l-eigenspace of S,-, so C\ — P\V. Since 
S 2 P 1 — \(l + S|).S ’2 has trace zero, exactly half of Ci is in the +l-eigenspace of S 2 . Thus C 2 
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has dimension 2" 2 . Since C — C,, induction yields dim C — 2" r . To find C explicitly, for any 
element S a e S , the set of elements {S a Sp\Sp e 5} = S because S is a group. Thus, 


1 


SaW) 

Sa^S 


is stabilized by S, where |i fr) is any »-qubit state. 

Let £ e Q n be a set of errors {} such that for all i and j, either e] Ej is in the stabilizer of S 
or anticommutes with at least one element of S. In other words. 


eJej i z(S) - s, 


where Z(S) is the centralizer of S, the subgroup of Q n that contains elements that commute with 
all elements of S. As per section 11.4.1, if Ej Ej stabilizes C, then (c a \Ej Ej |c*> = S a /,, and if 
eJEj anticommutes with an Si, then (c a \Ej Ej\Cb) = 0 unless i = j and a — b. Thus, any £ 
such that all E t . Ej e £ satisfy Ej Ej £ Z(S) — S is a correctable set of errors for code C. Of 
particular interest is the maximal 1 such that all errors E, and Ej on t or fewer qubits satisfy 

eJej i z(S) - 5. 

The distance d of a stabilizer code is the minimum weight of an element in Z(.S') — S. A 
[[« , k , r/]] quantum code represents A:-qubit message words in u-qubit codewords and has distance 
d. We use double brackets to distinguish quantum from classical codes. An [[n, k. <:/]]-quantum 
code is able to correct all errors of weight t or less if d > 2t + 1. 


11.4.3 Diagnosing and Correcting Errors 

Let C be a stabilizer code with stabilizers S given in terms of an independent generating set 
Si,. S r . Because S is Abelian, measurements by different .S', do not affect each other; the 
probability that a state | v) e V is measured and determined to be in the — 1-eigenspace of ,S, is the 
same no matter what other Sj have been measured before. Measurement of all r observables ,S,- dis¬ 
tinguishes 2 r subspaces {V e } of V, each of dimension 2 n ~ r . Each subspace has a unique signature 
e, a length r bit-string whose i lh bit e, indicates whether V e is in the +1 or —1-eigenspace of 5’, : 

V e = |^| (— l) < ’ i -eigenspace of S,. 

i 

Any error E e Q n either commutes or anticommutes with each .S',. The discussion of stabilizer 
codes started with the observation that for any |i>) stabilized by ,S;, the state E\v) is in the +1- 
eigenspace of .S', if E and 5, commute, and in the —1-eigenspace of .S', if they anticommute. Since 
both EC and V e have dimension 2 n ~ r , the subspace EC — V e for some e. Recall from section 
11.4.1 that if £ is a correctable error set for code C, then for all E, and Ej in £, either EjEj 
anticommutes with 5 or is in S. If Ej Ej anticommutes with S, then E,C and EjC are orthogonal 
subspaces. If eJ Ej is in S , then Ei \ v) — Ej \ v) for all | u> e C, and £; C = Ej C. In the first case, 
measurement by the r observables .S', distinguishes /S, C from EjC. In the second case, while the 
measurement cannot determine whether error E, or error Ej has occurred, it is not necessary to 
know; applying either Ej or Ej returns the state to the correct original. Every £’, in £ is associated 
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Figure 11.2 

Indirect measurement for operators M that are both Hermitian and unitary. Measurement of the ancilla qubit in the 
standard basis yields the same state on the quantum register with the same probability as would have been obtained by 
direct measurement of the register with M. 


with a unique signature e. When the S, are measured and an r bit string e is obtained, applying 
e] for any of the E, with signature e will return the state to the correct one no matter which error 
Ej e £ occurred. 

Let M be any Hermitian unitary operator on n qubits. Because M is both Hermitian and uni¬ 
tary, an indirect measurement may be performed with an additional ancilla qubit and the circuit of 
figure 11.2, where a gray circle means “measure according to the Hermitian operator encircled.” 
In this case, measure the ancilla qubit with operator Z, a measurement in the standard basis. The 
remainder of this section explains how this circuit achieves an indirect measurement according 
to M. 

Because M is both unitary and Hermitian, its only possible eigenvalues are +1 and —I. The 
circuit uses the fact that, for any the state c(|i/r) + M|i/r}) is a +1-eigenvector of M, and 
the state c'(\ir) — M\iJ/)) is a — I-eigenvector of M, where c and c' are the normalization factors 
c — 1/| | if) + M\x/j-) \ and c' = l/\\ijr) — M\\[r)\: if we write |i/r} in terms of eigenstates for M, we 
see that the —1-eigenvectors cancel in |i jf) + M\ifr), leaving the +1-eigenvectors. Let P + be the 
projection onto the + l-eigenspaceof M, so \{\ir) + M |i/r}) = P + \iJ/), and P~ the projection onto 
the —1-eigenspace of M, so y(|i/r) — M|i/f>) = P |i/r}. Direct measurement of \i]r) according to 

M yields with probability (i/f|P + |i/f> and with probability (ij/ \P~\ifr). 

This paragraph shows that the circuit of figure 11.2 yields these same states with the same 
probability. Prior to measurement, the state is 

+ \-)M\f)) = 4((|0> + |l»|tfr) + (|0> - |1»M|^» 

V2 2 

= l(-\0)(c(\is) + M\ir)))+ ^\l)(c\W) - MW)))). 

2 c c' 

Measurement of the ancilla qubit with Z yields 0 with probability p + = (//1 P + 1 //) and results 
in the n-qubit state c( \x/f) + M\ifr)). Similarly, the measurement yields 1 with probability p_ — 
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(ifr|P |i j/) resulting in c'(\tj/) — M\\/f)) as the state of the n qubit register. As an alternative 
argument, since M — P + — P and I = P + + P~, 

c(|l/r> +M\l[f)) = c((P+ + P~) |l/^> + (P + ~ P~M)) = CP + 1 l/r>. 

Thus, the circuit of figure 11.2 has the same effect on the n computational qubits as a direct 
measurement by M. To measure each of the .S', for 1 < i < m, we need m such circuits and m 
ancilla qubits. Measurement of these qubits yields the string e. 

11.4.4 Computing on Encoded Stabilizer States 

For stabilizer codes, certain operators U, including those in the Pauli group, have logically 
equivalent operators U that are simple to obtain. Chapter 12 shows that stabilizer codes also 
have good error handling properties. Again we work backward; instead of defining an encoding 
function and then finding logical operators for that encoding, we find candidate logical operators 
and then use them to define the encoding. In particular, we find logical single-qubit Pauli operators 
Z\, ..., Z* and use them to define an encoding function. 

In general, a state can be defined in terms of operators for which it is an eigenstate; for example, 
all of the standard basis elements for a A-qrihit system can be defined in terms of whether they 
are +1 of —1-eigenvectors for each of the operators Z\,..., Z k . As another example, section 

10.2.4 defined cluster states in terms of the operators that stabilize them. Here, the encoding 
function takes any standard basis vector | b\ ... b k ) to the unique state in the code C that is a 
(—1)*’ -eigenstate of Z; for all i. The rest of this section describes this program in more detail, 
developing the five-qubit code as a running example. 


Example 11.4.2 The set of observables 
S Q = X®Z®Z®X®I 
Si = Z®Z®X®I®X 

s 2 = z®x®i®x®z 

S 3 = X®I®X®Z®Z, 

defines a [[5, 1]] code. The four observables are independent, so each of the four observables 
divides the 2 5 -dimensional code space into two eigenspaces, leaving a space of 2 s /2 4 = 2 1 dimen¬ 
sions of codewords. This code satisfies the quantum Hamming bound, and so is a perfect code. 

Consider an element A e Z(S). For any i>) e C, the state A|u) is also in C: 


SiA\v) = ASi\v) = A\v), 

so A\v) is a +1-eigenstate of all the 5,-. If A is in S, then A|i>) = |u>, but for all A in Z(S) — S, A 
acts nontrivially on C. If A | = AiS„ for some S a e S, then A\ and A 2 behave in the same way 



286 


11 Quantum Error Correction 


on C. All the elements of the quotient group Z{S)/S act in distinct ways on C. To understand 
how they act on C, we need to know more about the structure of the centralizer Z(.S'). 

The symplectic view of Q„ illuminates the structure of Z(S). Recall from section 11.2.11 that 
any element of Q n can be written uniquely as 

p(X° l ® ■ ■ ■ ® X a ")(Z bl ® • • • ® Z h 

so to each element of Q„ there is an associated 2«-bit string 
(a\b) = a i... a„b \... b n . 

Moreover li : Q n -» 7Jf is a group homomorphism where 
(a\b)-(a'\b') = (a®a'\b®b'). 

The homomorphism h is four-to-one and loses the phase information contained in //. Since S 
does not contain —I, and therefore does not contain i7 or —i I, no two elements of S map to 
the same string, so on S the homomorphism h is one-to-one. The elements Si ,.... S r of Q n are 
independent, meaning that none of them can be written as a product of the others, if and only if 
the corresponding bit strings ( a\b ) are linearly independent. 

Two elements g and g' of Q n commute if and only if 

ab' + a'b — 0 mod 2, 

where ab' is the the usual inner product, the sum of bitwise multiplication of the corresponding bits, 
and (a\b) = h(g ) and {a'\b') — h(g'). The expression ab' + a'b mod2 is called the symplectic 
inner product. 


Example 11.4.3 The stabilizer group generated by the four observables of example 11.4.2 has 
sixteen elements S a : 


Si Si = I = 7 ®/®/®/®/ 

Si = z®z®x®/®x 
s 3 = x®/®x®z®z 

S 0 S 2 = — Y ®Y®Z®I®Z 
SiS 2 = -7®T®A®A®T 
S 2 S 3 = -T®Z®Z®T®7 
S0S1S3 = —Z ®/®Z®F®T 
S 1 S 2 S 3 = -A®T®7®F®A 


S 0 = A®Z®Z®Z ®7 
S 2 = Z®A®/®Z®Z 
S 0 Si = -T®/®T®A®A 
S0S3 = — 7 ®Z®T®T®Z 
Si S 3 = —T®Z®/®Z®T 

s 0 SiS 2 = -x®A®y®/®y 
SoS 2 s 3 = —z ® y® y®z® / 

S 0 S| S9S3 = 7 ®A®Z®Z®A. 


The following table shows the error syndrome for the code defined by these observables. For 
single-qubit errors X, Y , and Z on qubits 0 through 4, the corresponding column shows the result 
of measurement with observable S, after that single error occurred on a codeword. The + and — 
indicate whether the result of measurement with S, on that qubit is +1 and — 1 respectively. The 
results of the four measurements identify the error uniquely. Counting + as 0 and — as 1, the last 
row shows a unique decimal value, coming from measurement of all four observables. 
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bitO 

bit 1 

bit 2 

bit 3 

bit 4 


X 

Z 

Y 

X 

Z 

Y 

X 

z 

Y 

X 

Z 

Y 

X 

Z 

Y 

So 

+ 

— 

— 

— 

+ 

— 

— 

+ 

— 

+ 

— 

— 

+ 

+ 

+ 

Si 

- 

+ 

- 

- 

+ 

- 

+ 

- 

- 

+ 

+ 

+ 

+ 

- 

- 

Si 

- 

+ 

- 

+ 

- 

- 

+ 

+ 

+ 

+ 

- 

- 

- 

+ 

- 

Si 

+ 

- 

- 

+ 

+ 

+ 

+ 

- 

- 

- 

+ 

- 

- 

+ 

- 


6 

9 

15 

3 

4 

7 

1 

10 

11 

8 

5 

13 

12 

2 

14 


Let Si ,..., S r be an independent generating set for S. Form the r x 2n binary matrix 


M 


( (p\b)i 
(a\b) 2 


\ 


V ( a\b) r / 


with the ( a\b), = h ( S ,) as the rows. Because the .S', are independent, so are the rows of M, and 
so M has rank r. The matrix M acts on a 2n-bit string ( a\b ), viewed as a column vector (|), 
to produce a length r vector in which the zth entry is the symplectic inner product of (a\b) with 
(a\b)i. The matrix M has a kernel of dimension 2 n — r. Elements of this kernel correspond to 
elements of Q„ that commute with all elements of the stabilizer. Thus, there are 4 • 2 2 " ' elements 
in Z(S), where the factor of 4 comes from the four possible values of //; these values of /z will 
not be relevant for the remaining discussion because elements of S are uniquely determined by 
the corresponding string (a\b). For an [[«, &]] stabilizer code, the size of the stabilizer subgroup 
is 2 n ~ k . So for an [[«, k]\ code, there are l 2n ~ r — 2 n+k elements in Z(S). 

Take Z\ to be any element of Z(S) that is independent of Si,..,, S r . Form the (r + 1) x 2 n 
binary matrix M\ by adding as an additional row to M the 2/r-bit string corresponding to Zj. 
The matrix M\ has full rank r + 1. Let C\ be the size 2 2n ~ (r+1 1 = 2 n+k ~ 1 set of binary strings, 
viewed as column vectors (-), that are in the kernel of M\. Let Z 2 be any element of Z(S) 
that corresponds to a bit string in C\. We can continue this process k times to obtain operators 
Z\, ... Zj : that commute with each other and with all elements of S. The kernel of M k will 
be S. 

Consider unencoded standard basis vectors for a moment. The /.'-qubit state 100 ... 0) is the 

unique +1-eigenstate of the Z].Z/.. More generally, the standard basis vector \b\ ... b k ) is 

the unique state that is a (— 1)*' -eigenstate of Z,- for all i. For any Z:-bit string b\ ... b k , there is a 
unique element of the code C that is a (—T)*' -eigenstate of Z, for all i; the argument that there 
is a unique element is similar to the argument that established the dimension of C. We define an 
encoding function Uc for the code C that takes standard basis elements to elements of C that have 
analogous eigenstate relations with the logical versions Z; of the Z,-. AZr-qubit state 2Zi=o a -\ I At) 
is encoded as follows: 
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2*-l 2 a '-1 

U C : Y, a *\x) -> Y' a x \x) (11.10) 

.1=0 x =0 

where \x) is the unique element of C that is in the (— 1 )■*< -eigenspace of Z; for all 0 < i < k. 

The (r + k) x 2n = n x 2n matrix M* has full rank; therefore for any i, there is a bit string 
(. a\b ) that, when viewed as a column vector (£), yields the /r-bit string e, which has a 1 in the i th 
place and 0 elsewhere: in particular, there is a 2/7-bit string (a\b) that satisfies 

"* (;) = e " 

Let X] be the element of Z(S) with bit string (u\b) that yields e\ when multiplied by Mi-. Construct 
Mk+ 1 by adding as a row to Mi the bit string corresponding to X\. Let AS be such that its bit 
string (a|Z?) satisfies 



We can continue in this way until we obtain X \, ..., A - ;, . By construction, A", anticommutes with 
Z, , and commutes with all of S, all Xj, and all the Z ; for j ^ i. 


Example 11.4.4 For the [[5, 1]] code of example 11.4.2, the binary matrix corresponding to the 
independent generating set {5, } is 


(1 

0 

0 

1 

0 

0 

1 

1 

0 

0 \ 

0 

0 

1 

0 

1 

1 

1 

0 

0 

0 

0 

1 

0 

1 

0 

1 

0 

0 

0 

1 

V i 

0 

1 

0 

0 

0 

0 

0 

1 

1 / 


The bit string (a\b) = (11111100000) is independent of the rows m € M and satisfies Mb = 0, 
so we may take 

Z = Z®Z®Z®Z®Z 

and, since (b\a) is orthogonal to ia\b) and all rows of M, we may take 
X = X®X®X®X®X. 


Let |e ( ) be the unique state in C that is a — 1-eigenstate of Z, but a +1-eigenstate for all the Z ; 
with j ^ i. For j ^ ;, 

ZjX,\ii) = X,Zj\gi) = Xi\ei), 

so Xj | ij) is a +1-eigenstate of Z ; - for j ^ i. For Z,-, 

ZiX,\ei) = -XiZtfi) = -Xj\e~j), 
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so Xj\ei) is in the +l-eigenstate of Z, as well. This calculation suggests that X, is the logical 
analog of X ,• for C with encoding Uc of equation 11.10. A full proof is straightforward. 


Example 11.4.5 For the [[5, 1]] code of example 11.4.2, the +l-eigenspace of Z is spanned by 
the set of standard basis states with an even number of Is. Thus, we may take |0> to be 

|6> = -^= s a looooo) 

V 1^1 s a eS 

= * (| 00000 } + 110010 } + 100101 } + 101010 } 

+ 110100 } - 110111 } - 111000 } - 100110 } 

- 101111}- 110001} - 111110}- 111101} 

- 100011 } - 101100 } - 111011 } + 101001 }), 

and 11) to be 

a superposition of all basis vectors with an odd number of ones. 


The construction of logical versions of other single-qubit gates and multiple-qubit gates for a 
stabilizer code C is more complicated. Chapter 12 looks at this issue in more detail and provides 
constructions for a universal approximating set of logical gates for the Steane code. 

11.5 CSS Codes as Stabilizer Codes 

Let Ci and C 2 be [n, k\] and [n, £ 2 ] classical codes respectively, and suppose both correct t errors. 
Furthermore, suppose C Ci. These codes satisfy the condition required for the construction 
of an [[ n , k\ — A' 2 ]] CSS code. This section describes an alternative to the CSS code construc¬ 
tion of section 11.3, one that uses the stabilizer viewpoint. 

Let P\ (resp. Po) be the parity check matrix for the code Ci (resp. C 2 ). For each row of Pi, 
viewed as a bit string b = b\.. ,b„, construct an observable 

X b = X bl <g> • • • ® X bn . 

These n — k\ observables are independent, since the rows of P\ are linearly independent. For 
each row of P 2 , also construct an observable 

Z h = Z ,n ® ■ ■ ■ ® Z K . 

These n — ko observables are also independent, and, since X and Z are independent, the entire 
set of 2n — k\— ki observables is independent. The group S generated by these observables does 
not contain —I, so S defines a stabilizer code if and only if it is Abelian. 
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This paragraph shows that the CSS condition implies that S is Abelian. Since X and I commute, 
all elements of {X a \a e Pi} commute. Similarly, all elements of {Z h \b e Pi] commute. Since X 
and Z anticommute, the group elements X a and Z b commute if and only if a ■ h is even. So the 
elements of S commute if ah' = 0 mod 2 for all rows a of Pi and rows b of lb. This equality 
holds if Pi Pf — 0 mod 2. Because c Cj, the generator matrix satisfies Pi = 0. Since 
G 2 — P{ > we have P\ Pf = 0. Therefore, S is Abelian and the subspace C, stabilized by S. is a 
stabilizer code. 

The code CSS{C\, C 2 ) of section 11.3.2 is stabilized by .S'. Since S has n — k\+n — k .2 
independent generators, it stabilizes a subset of dimension 

n — (2 n — k\ — k 2 ) = k\ + £2 — n. 

Since CSS(C\, C 2 ) has dimension k\ + A :2 — n, it is the stabilizer code for S. 


Example 11.5.1 

The Steane code revisited. The parity check matrix 

/ 1 1 

1 0 

1 

0 

0 \ 

P = 1 1 

0 1 

0 

1 

0 

V 1 0 

1 1 

0 

0 

1 ) 


defines the [7, 4] Hamming code. The Steane code takes the [7, 4] Hamming code as both C\ and 
C 2 . To obtain stabilizers for the Steane code, define an operator in Qi for each row in the parity 
check matrix that has a Z in every place a 1 occurs and an I for every 0: 

z <g> z <g> z <g> / <g> z <g> / ® / 

Z®Z®/®Z®/®Z®7 

Z0/0Z0Z0/0/0Z. 

For each row in the parity check matrix, also define an operator that has an X wherever a 1 occurs: 

X®X®X®I®X®1®I 

X®X®1®X®I®X®I 

X®I®X®X®I®I®X 

These six observables stabilize exactly the Steane code C, so the Steane code is a [[7, 1]] stabilizer 
code. 


11.6 References 

Hungerford’s Abstract Algebra: An Introduction [159] includes a chapter on classical error cor¬ 
rection, as well as giving a thorough treatment of the group theory and linear algebra involved. 
Wicker’s Error Control Systems for Digital Communication and Storage [283] discusses classical 
error correction and includes chapters on the relevant algebra. 
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The nine-qubit code of section 11.1.3 was originally proposed by Shor [251]. The seven-qubit 
code of section 11.3 was originally proposed by Steane [259]. The theory of stabilizer codes and 
fault-tolerant implementation of them are discussed at length in Daniel Gottesman’s thesis [135]. 


11.7 Exercises 


Exercise 11.1. For the code C : -» defined by generator matrix 



give 

• the set of code words 

• two distinct parity check matrices 


Exercise 11.2. Computing a parity check matrix for a code specified by a generator matrix. 
a. Show that adding a column of a generator matrix G for a code C to another column produces 
an alternative generator matrix G' for the same code C. 


b. Show that for any [n, k] code there is a generator matrix of the form 
(n — k) x k matrix and I is the k x k identity matrix. 



where A is a 


c. Show that if G — 



, then the (n — k ) x k matrix P — (7| A), where I is the ( n — k ) x 


(n — k) identity matrix, is a parity check matrix for the code C. 


d. Show that if a parity check matrix P' has the form (A|7), then G' — 
matrix for the code. 



is a generator 


Exercise 11.3. Show that the code Cpf of section 11.1.2 corrects all linear combinations of 
single-qubit phase-flip errors [7, Z 2 , Zi, Z 0 } on any superposition fl|0) + b |1). 

Exercise 11.4. Show that the code Cpf of section 11.1.2 does not correct bit-flip errors. 

Exercise 11.5. Show that if an [[«, k , t/]]-quantum code is able to correct all errors of weight t 
or less, d > 2t + 1. 

Exercise 11.6. Show that all Hamming codes have distance 3 and so correct single bit-flip errors. 
Exercise 11.7. Show that Shor’s code is a degenerate code. 

Exercise 11.8. Alternative Steane code constructions. 

a. Find a parity check matrix of the form (/1 A) for the Steane code. 
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b. Construct an alternative circuit, based on the parity check matrix found in (a), that can serve 
as a syndrome extraction operator for the Steane code. 

Exercise 11.9. Show that the generalized set of Pauli elements forms a basis for the linear 
transformations on the vector space associated with an n -qubit system. 

Exercise 11.10. 

a. Show that for all i and j and for all orthonormal |ci) ^ |C 2 ) eC, 

(ciizjyiica) = 0 

(c 1 \IY i \c 2 )=0, 

b. Show that for all j ^ i and for all orthonormal |ci} ^ |C 2 ) e C, 

{c l \Y}Y i \c 2 )=Q. 

Exercise 11.11. Describe how the Shor code can be used to correct single-qubit errors without 
making any measurements. 

Exercise 11.12. Show that if two blocks encoded according to code C are subjected to an error 
E that is a superposition of errors E = E a ® + E c ® Ej, where E a , Eb, E c , and E,( are all 
elements of a correctable set of errors £ for C, then E can be corrected. 

Exercise 11.13. Suppose a single qubit |i [r) = «|0) + /;11} has been encoded using the Steane 
code and that the error E = ^2 + ^Zj has occurred. Write down 

a. the encoded state, 

b. the state after the error has occurred, 

c. for each phase of the error correction, the syndrome and the resulting state, and 

d. each error correcting transformation applied and the state after each of these applications. 

Exercise 11.14. Show that for a [[«, k]] quantum stabilizer code there is, for any k bit string 
b\ ... bk, a unique element of the code C that is a (—l)*' -eigenstate of Z, for all i. 

Exercise 11.15. Show that the subspaces V e of section 11.4.3 are of dimension 2 n ~ r . 

Exercise 11.16. Show that the operators X ,■ , as defined for stabilizer codes in section 11.4.4, act 
as a logical analog of the gates X, for the logical states obtained from the encoding c. 

Exercise 11.17. Show that the [[9, 1]] Shor code is a stabilizer code. 

Exercise 11.18. Find alternative ways of implementing operations corresponding to X and Z on 
the logical qubits of the five-qubit stabilizer code. 

Exercise 11.19. Let [[ n , k, cl]\ be any nondegenerate code. Such a code can correct t = J 
errors. Show that tracing any codeword over any n — t qubits results in the totally mixed state 
p = I 011 the remaining t qubits. Thus, all codewords are highly entangled states. 
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Fault Tolerance and Robust Quantum Computing 


Quantum error correction by itself is not sufficient for robust quantum computation. Robust com¬ 
putation means that arbitrarily long computations can be run to whatever accuracy desired. The 
analyses of quantum error correction techniques in chapter 11 supposed that the error correcting 
procedures were carried out perfectly, an unrealistic assumption. Also, even if the environment 
interacts with the system only in ways that can be handled by the error correcting code, gates 
used as part of the computation may propagate errors in ways that produce errors the code cannot 
correct. To achieve robust quantum computation, quantum error correction must be combined 
with fault-tolerant techniques. 

This chapter presents one approach to robust quantum computation: error correction coupled 
with fault-tolerant procedures. Other approaches to robust quantum computation exist, both for 
quantum computation in the standard circuit model and for alternative models of computation. 
These alternative approaches will be touched on briefly in sections 13.3 and 13.4 respectively. 
The chapter concludes with a threshold theorem for one error model. Threshold theorems prove 
that as long as the error rate is below a certain threshold, a quantum computer can run arbitrary 
long computations to arbitrarily high accuracy. This chapter uses a simple error model sufficient 
to illustrate a general approach to fault-tolerant quantum computation: the strategy is to replace 
a circuit with an expanded circuit that is more robust; if the original circuit’s chance of failing 
was O(p), the expanded circuit fails with only probability 0(p 2 ). Given a general method 
for obtaining such expanded circuits, arbitrary low probabilities of failure can be achieved by 
concatenation; the expanded circuit can be replaced with a yet larger and more robust circuit, 
and we can repeat this process, called concatenated coding , until the desired level of accuracy is 
achieved. A key feature of concatenated coding is that only polynomial resources are required to 
obtain exponentially low probabilities of failure. 

Like quantum error correction, fault-tolerant quantum computing is a richly developed field. 
A variety of approaches have been developed, and threshold theorems for a variety of error 
models and codes have been proved. Fault-tolerant quantum computation remains an active area 
of research, and like quantum error correction, it will evolve as quantum information process¬ 
ing devices are built, more realistic error models are learned, and more sophisticated quantum 
computer architectures are developed. 
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As in chapter 11, this chapter concentrates on quantum error correcting codes that correct errors 
of weight t or less. The most important issue in ensuring that a circuit can always be replaced 
with a more robust expanded circuit is the control of spread ; if in the course of computation a 
single error propagates to additional qubits before we are able to correct it, then the probability 
that our computation becomes subject to an error that we cannot correct is much higher. Fault- 
tolerant quantum computing methods aim to eliminate the propagation of errors from small 
numbers of qubits to a large numbers of qubits. All aspects of quantum computation must be made 
fault-tolerant: error correction, the gates themselves, initial state preparation, and measurement. 

The sections of the chapter gradually peel away at assumptions of perfection, adding mech¬ 
anisms to handle each source of errors, to arrive at a full program of fault-tolerant procedures 
that support robust quantum computation. The chapter uses Steane’s seven-qubit code as a run¬ 
ning example. Section 12.1 discusses the setting in which we describe fault-tolerant techniques, 
including the error model and when error correction steps are applied. Section 12.2 addresses 
fault-tolerant quantum error correction. Section 12.2 examines the design of a full set of fault- 
tolerant procedures for performing arbitrary computations on qubits encoded with the Steane 
code. Section 12.3 describes concatenated coding leading to a threshold result. 

12.1 Setting the Stage for Robust Quantum Computation 

To simplify the presentation, we consider only [[ k , 1]] quantum codes. Given a specific set of 
universal gates and a specific [[k, 1]] quantum error correcting code, fault-tolerant techniques aim 
to take any circuit composed of those gates and produce a circuit on encoded qubits such that the 
probability of a faulty computation is reduced even though more qubits and more operations are 
involved. Fault-tolerant techniques for a given code address the question of how to implement 
logical procedures on the computational qubits, syndrome extraction operators, measurements, 
state preparation, and error correcting transformations in such a way that the resulting computa¬ 
tion is more robust than the original one; roughly, if the original circuit failed with probability 
O(p), then the expanded circuit fails with probability ()(p 2 ). Before describing fault-tolerant 
techniques, we need to discuss when quantum error correcting operations are applied, and how 
we will model the errors. 

Let Qt) be a quantum circuit for a computation we wish to make robust. We partition time into 
intervals in which at most one gate acts on any qubit (figure 12.1). This partitioning is not unique: 
for the circuit of figure 12.1, the single-qubit gate applied to the first qubit could have been placed 
in the second time interval instead, or the first time interval could have been split in two with, for 
example, the single-qubit operations performed in the first interval and the two-qubit operation in 
the second. Given a partitioned circuit Q o, we will define an expanded circuit Q i, in which every 
qubit expands to a block of k qubits, and each time interval expands into two parts, one in which 
procedures implementing the logical gates are carried out, and a second in which the syndrome is 
measured and error correcting transformations are applied (figure 12.2). Both of these procedures 
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Figure 12.1 

The original circuit <2o for a computation we wish to make robust partitioned into time intervals in which at most one 
gate acts on a single qubit. 



Figure 12.2 

Schematic diagram showing the general structure, including the subpartitioned time intervals, for an expanded circuit for 
a [[7, 1]] quantum code. The expanded circuit alternates between carrying out logical procedures and performing error 
correction (EC). 
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may require ancilla qubits. The expanded circuit is then subpartitioned into time intervals in which 
no qubit is acted on by more than one gate. We could have chosen to apply error correction less 
often. We have chosen to apply it as often as possible: at the end of each logical procedure. A 
number of choices remain. Which procedures are used to implement the logical gates, and how the 
syndrome and error correcting transformations are performed, determine whether the expanded 
circuit Qi is more robust than the original circuit <2o or not. 

For the purposes of describing fault-tolerant procedures, we use a model in which errors 
only take place at the beginning of time intervals. We model imperfect single-qubit gates as 
a single-qubit error followed by a perfect gate. Similarly, we model imperfect C no , gates as 
two single-qubit errors followed by a perfect gate. Correlations between these errors can be 
ignored because quantum error correction is applied separately to each block and, within our 
fault-tolerant procedures, we will allow C not transformations only between qubits in differ¬ 
ent blocks. Errors due to interactions with the environment are modeled as occurring only at 
the beginning of time intervals. For our initial discussion, we use the local and Markov error 
model of section 10.4.4. This model means that each qubit, in each time interval, interacts only 
with its own environment at the beginning of each time step (figure 12.3), and that the state 
of the environment at the beginning of each time interval is uncorrelated with the state at all 
previous times. The threshold theorem discussed in section 12.3.2 uses a more general error 
model. 



Figure 12.3 

Schematic diagram of the error model with environmental interactions at the beginning of each time interval. The boxes 
representing the environmental interactions show each qubit interacting with its own environment of arbitrary size. 
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12.2 Fault-Tolerant Computation Using Steane's Code 

In medicine, doctors take an oath, “first, do no harm .” While the quantum error correction 
methods described in the previous chapter enable the correction of errors during the course of a 
quantum computation, they also increase the chance of errors, since they require more qubits and 
more gates. Analysis of quantum error correction in chapter 11 made the unrealistic assumption 
that the error correction steps were carried out perfectly. In fact, as we will see shortly, if we 
used the quantum error correction schemes of the last chapter exactly as described, imperfections 
in the process would likely introduce more errors than the process corrects. These schemes are 
not fault-tolerant. Fortunately, these schemes can be modified so that they do not introduce 
more errors than they correct. We illustrate fault-tolerant quantum error correction techniques by 
demonstrating how to make the Steane seven-qubit code fault tolerant. Fault-tolerant techniques 
put in safeguards so that a single error never propagates to multiple qubits, since multiple errors 
cannot be corrected by Steane’s code. The strategy is to replace parts that fail under a single qubit 
error with an ensemble of parts that fails only in the presence of two or more errors, so that, if 
originally a part fails with probability p, the ensemble replacing it fails only with probability cp 2 . 

Section 12.2.1 illustrates ways in which the quantum error correction techniques of 11.3.3 fail 
to be fault tolerant. Section 12.2.2 shows how to perform error correction in a fault-tolerant way 
using the Steane code as the example. Section 12.2.3 develops fault-tolerant logical gates for 
the Steane code that limit the propagation of errors, and sections 12.2.4 and 12.2.5 deal with 
fault-tolerant measurement and fault-tolerant state preparation. Together, these procedures make 
the quantum computation robust against errors in the system and ancilla qubits, in the gates, in 
measurement, and in state preparation. 

12.2.1 The Problem with Syndrome Computation 

The computation of the syndrome is potentially dangerous to the computational state. Consider 
the first of the six parity check circuits we gave for the Steane code, the one shown in figure 12.4, 



Figure 12.4 

One of the six syndrome computation circuits for Steane's seven-qubit code. 
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which acts on the first, second, third, and fifth qubits of the encoded state and on the first ancilla 
qubit. Steane’s code was designed to correct any single-qubit error on any one of the qubits. We 
want to make sure that imperfections in carrying out the error correction scheme do not make 
things worse. In particular, we want to make sure that a single error in carrying out the quantum 
error correction does not result in multiple errors on the encoded qubits. Suppose a bit-flip error 
occurs on an ancilla qubit leading to the “correction” of a nonexistent error. Such an occurrence is 
annoying but not too serious; the “correction” only introduces a single-qubit error on the coding 
qubits, which, as long as another error does not occur, will be corrected by the next round of 
error correction. (Since the ancilla qubit will be used again only if it is first reset to |0), its error 
does not propagate further.) There is a worse possibility, one that results in multiple errors on 
the coding qubits, something that will not be corrected by subsequent rounds of error correction 
even if no other error occurs. Take a minute to see if you can see the problem; spotting the 
problem is a good test of whether you are thinking of quantum circuits in a quantum or classical 
fashion. 

Syndrome extraction operators for quantum codes commonly use controlled gates. On the face 
of it, controlled operations seems perfectly safe since how could computing/rom the computational 
qubits to the ancilla qubits adversely affect the state of the computational qubits? However, as we 
saw in caution 2 of section 5.2.4: the notions of from and to are basis dependent; in the Hadamard 
basis the control and target qubits of a C not are reversed, and phase flips become bit flips and 
vice versa. Consider for example what happens if the coding qubits ftj, bi. ft;?, ft 5 are in the state 
|+) and a ZH error occurs on the ancilla qubit before the syndrome computation has begun. The 
error places the ancilla qubit in the state |—} so that when each C, wt is performed it acts as the 
control bit, with the result that all four qubits fti, 1 ) 2 - ft;?, ft?? have been flipped to the |—) state. In 
this way, a single error on the ancilla qubit propagates to multiple errors on the coding qubits. 

12.2.2 Fault-Tolerant Syndrome Extraction and Error Correction 

The example of section 12.2.1 suggests that, to obtain fault-tolerant error correction, an ancilla 
qubit should be connected with at most one coding qubit. To implement Steane’s code in a 
fault-tolerant manner, we must use a circuit of the following form: 
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If we measure the ancilla qubits, however, we are in danger of gaining information about the 
quantum state of the coded qubits, not just the error, which means the measurement is likely to 
affect the state of the encoding qubits. For example, suppose the encoded state was -4 (|0> + 11)) 
before a single-qubit error occurs on qubit b$. Measurement of the ancilla qubits in the standard 
basis will tell us that an error occurred on qubit b$, but it will also destroy the superposition, so 
that the error correcting operation will “restore" the state to |0) or 11) instead of the correct state 

^(|6> + |i». 

The trick to avoid gaining too much information is to initialize the ancilla qubits in a state from 
which it is impossible to learn anything about the computational state. The four ancilla qubits 
replace a single qubit in the non-fault-tolerant circuit, so from measuring all four qubits of the 
ancilla, only one bit of information needs to be gained: the value of the corresponding syndrome 
operator. Exercise 5.9 suggests how to achieve this result; a carefully designed initial starting 
state \(po) for the ancilla that becomes a second state \<p e ) under a single-qubit bit-flip error on any 
one of the four qubits will yield only one bit of information. Consider 

““iTi w ' 

dfj(x) even 

where the sum is over all strings with even Hamming weight, and 

i*> = ^ £ w- 

Under errors in the encoded qubits that would have resulted in syndrome state |0> in the original 
syndrome computation, the ancilla remains in state \<po). Under errors that would have resulted 
in \cpi), the ancilla ends up in state | cp e ). These two states are distinguished by a measurement in 
the standard basis that yields a random even-weighted string in the no-error case and a random 
odd-weighted string in the error case. This measurement provides only one bit of information. 

One final problem needs to be addressed before a fault-tolerant implementation of the Steane 
code syndrome measurement is obtained. The solution given above requires the preparation of 
the state \(po). We must make sure that we can prepare |0o> in a fault-tolerant way. Our strategy 
is not to use a state we prepare if it deviates too much from \(/>q). In particular, we want to make 
sure that a faulty preparation does not produce errors in multiple coding qubits. Applying the 
Walsh-Hadamard transformation to the cat state 4= (| 0000) -T-11111) produces the state |0o)- The 

'V 4 

circuit of figure 12.5 constructs the cat state in a non-fault-tolerant way. To see how to turn this 
construction into a fault-tolerant one, let us look at what errors may occur. 

A single error in any one of the cat state qubits must not propagate to an error in more than one 
of the coding qubits. Bit-flip errors in the overall construction of the ancilla state, even multiple 
bit-flip errors resulting from a single error, are not a concern; the worst they do is cause an error in 
the syndrome, which results in at most a single-qubit error when the “correction" corresponding to 
this syndrome is carried out. Multiple phase errors resulting from a single error must be avoided, 
since such errors could result in multiple errors in the coding qubits. Before the final Hadamard 
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Figure 12.5 

Non-fault-tolerant construction of a cat state. 



Figure 12.6 

Fault-tolerant cat state ancilla preparation. The Z measurement test whether the first and fourth qubits have the same 
value. If this measurement fails, the state is discarded, and the state preparation is repeated until the test is passed. 


transformations are applied, phase errors were bit-flip errors, so we must avoid bit-flip-error 
propagation in the first part of the circuit. In the circuit of figure 12.5, a bit-flip error in either the 
second or third qubit can propagate to the successive qubits. However, either of these bit flips 
would mean that the first and fourth qubit have opposite values, whereas in the error-free case 
the first and fourth qubits have the same value. If we insert a check for equality of these values, 
we can discard the state and redo the preparation if the check fails. This single-qubit test suffices. 
Figure 12.6 shows cat state ancilla preparation that includes this test. 

12.2.3 Fault-Tolerant Gates for Steane's Code 

To perform arbitrary quantum computations on the logical qubits of the Steane code, a universal 
set of fault-tolerant logical gates that can approximate any unitary operator on the logical qubits 
must be available. Even implementations of logical single-qubit gates may not be fault-tolerant, 
since they may propagate a single error to multiple qubits. For example, the most obvious, 
though far from optimal, way to carry out a logical single-qubit operation is to decode the logical 
qubit, apply a true single-qubit operation to the resulting single qubit, and then re-encode. Such an 
implementation is clearly not fault-tolerant; if an error occurs to the single qubit after the decoding. 
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upon re-encoding the error will propagate to all seven of the encoding qubits. Furthermore, 
for logic gates involving more than one logical qubit, fault-tolerant implementations must not 
propagate a single error in one block to multiple errors in another. 

For the Steane code, it is easy to find fault-tolerant implementation for some gates, including 
X , H, and C nor . For most other gates, it is challenging to find a fault-tolerant implementation, and 
for some gates, including the Toffoli gate and jr/8-gate, the only fault-tolerant implementations 
known require auxiliary qubits. For a fault-tolerant implementation of the logical X operation, 
recall from section 11.3.3 that the logical qubit |0> is an evenly weighted superposition of all 
elements of C, and that 11) is an evenly weighted superposition of all elements of C L — C. Recall 
further that the elements of C that are not in C A are those obtained from elements of C by adding 
1111111. Thus, applying X to every qubit in the seven-qubit block performs the logical X gate 
taking |0> to 11), and |1> to |0). Expanding on this reasoning using relations such as “adding any 
element of C to any element of C 1 results in an element of C L ” shows that the logical C not may 
be implemented by applying C no , operators between the corresponding qubits of the two blocks, 
as shown in figure 12.7. Both of these implementations are fault-tolerant because a single error 
cannot create multiple errors either in its own block or in another. Unfortunately, the transversal 
strategy applied in these examples, where gates are applied only between corresponding qubits 
in the blocks, does not work in most cases. 

When the transversal strategy does not work, it can be highly nontrivial to find a fault-tolerant 
implementation. The construction of fault-tolerant procedures depends on the code used. For some 
codes it is not known how to construct fault-tolerant implementations of some logical gates. Even 
for the Steane code, most single-qubit operations cannot be implemented transversally. Applying 
the phase gate P| = |0> (0| + i 1} (11 of section 5.5 to all seven qubits of the Steane code results 
in the logical gate |0}(0| — i| 1) (11, which isn’t quite Bj. In this case, applying |0>(0| — i| 1> (11 to 
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Figure 12.7 

Fault-tolerant C, 
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z ) 


Figure 12.8 

The transversal approach does not result in a logical Toffoli gate T. 

each qubit results in a fault tolerant implementation of Px , but in other cases, there is no easy 

• 7T ~ 

hx. For example, not only does applying P* — |0> (0| + e * | 1) (11 to each qubit not result in P j, 
but it is not possible to implement P* in any transversal way. No transversal implementation of 
P | is known. For Pn _, the only fault-tolerant implementations known require ancilla qubits. 

The logical Toffoli gate T cannot be implemented by the application of Toffoli gates to corre¬ 
sponding qubits in the three blocks as shown in figure 12.8 (see exercise 12.4). Like the P* gate, 
only nontransversal implementations of T are possible for the Steane code. 

In order to show that fault-tolerant computation can be done on data encoded using the Steane 
code, we need to show that all logical unitary operators can be approximated arbitrarily closely by 
the application of a sequence of fault-tolerant gates. We give fault-tolerant versions of the logical 
operations for the universally approximating set of gates described in section 5.5: the Hadamard 
gate H, the phase gate P*, the controlled-NOT gate C not , and the jr/8-gate P|. Fault-tolerant 
implementations for P^ and C„ ot have already been described. The logical Hadamard gate H 
can be implemented transversally by applying H to each of the qubits in the block. Finding a 
fault-tolerant implementation of P* is more work. 

A number of fault-tolerant implementations use the same key idea: many transforms that can 
be implemented using a fault-tolerantly prepared ancilla state do not have a direct fault-tolerant 
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Figure 12.9 

A circuit that forms the basis for a fault-tolerant implementation for Pj. Application of P „/2 X is conditional on the 
outcome of the measurement with Z. 

implementation. The trick is to use measurement. We illustrate these techniques by developing 
a fault-tolerant implementation of the 7r/8-gate. It is perhaps unsurprising that the state \n/A) = 
|0> + e 4 11) can be used to implement the jr/8-gate P |. The circuit of figure 12.9 performs the 
7r/8-gate Pk. on any input state |i Jr). Since we already know fault-tolerant implementations of the 
C, w t, P |, and X, we must find a fault-tolerant preparation of the encoded state |tt/ 4) to realize this 
circuit. We first consider fault-tolerant measurement, which will be used as part of fault-tolerant 
state preparation. 

12.2.4 Fault-Tolerant Measurement 

Recall from section 11.4.3 that if M is both Hermitian and unitary, an indirect measurement may 
be performed with an additional ancilla qubit using the following circuit: 



This construction is far from fault-tolerant, since a single error in the ancilla qubit could propagate 
to all n qubits. To make this construction fault-tolerant, we use a cat state as we did for the fault- 
tolerant syndrome measurement of section 12.2.2. For the present fault-tolerant construction 
we need an u-qubit cat state and, just as for fault-tolerant quantum error correction, we must 
perform checks on the cat state we construct and discard any states that fail those tests. Indirect 
measurement by M has a fault-tolerant implementation whenever a version of M controlled 
correctly by the cat state can be constructed. If M has a transversal implementation in terms of 
single-qubit operators, a controlled version is easy to obtain: control each single qubit operator 
with the corresponding qubit in the cat state, so that either all single-qubit operators or none at all 
are performed. The use of this construction is illustrated in the fault-tolerant preparation of the 
state |jt/ 4> described in the next section. 
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12.2.5 Fault-Tolerant State Preparation of |rr/4) 

To prepare a state \<p) fault-tolerantly, it suffices to find an efficiently and fault-tolerantly imple- 
mentable measurement operator M for which \<fi) is an eigenstate. Any fault-tolerantly prepared 
state that is not orthogonal to \(p) yields \<j>) with positive probability when measured by M. When 
an incorrect eigenstate is obtained after such a measurement, the process can be repeated until 
the correct state is obtained or, in many cases, a fault-tolerant gate can be used to transform the 
obtained eigenstate into the desired one. 

To obtain an efficiently and fault-tolerantly implementable M for the |7r/4> state, we begin 
with a general observation about the operators 

Pg= |0>(0|+ e W |l>(l| 

and the states 1 0) = -4^(|0> + e' e |1)). Since X has eigenvectors |+) and |—>, with eigenvalues 1 
and — 1 respectively, Pg X Pg 1 has eigenvectors Pg | +> = (10) + e' e | 1)) and Pg |—} = (10) — 

e l0 \ 1)) with eigenvalues 1 and —1 respectively. At first, this fact does not seem useful; yes, |tt/ 4) 
is an eigenstate of M — Pn,AXP~; A , but it is Pj j /4 we are trying to implement in the first place. 
However, the commutation relation XPg 1 = e~' e PgX implies that 

M — P n / A XP iz ^ 4 — e ' 4 Pjz/gP tt/4A" = e ' 4 Pn/iX, 

and we know how to implement P n and X fault-tolerantly. 

For the indirect measurement construction to work, we do not need to implement full controlled 
versions of these gates; instead, we need only to implement versions that are correctly controlled 
by the cat state used to fault tolerantly implement the measurement, a much easier task. To obtain 
the logical analog of indirect measurement by M, apply a controlled e ' J phase gate between the 
first qubit of the cat state and the first qubit of the ancilla followed by seven controlled P n / 2 X 
gates, between the seven pairs of corresponding qubits of the cat state and the ancilla, implements 
Pjt/iX (see figure 12.10). The cat state construction is then undone and the remaining qubit 
measured in the standard basis. If the measurement result is 0, the desired state |7r/4) is obtained. 
If the measurement result is 1, the resulting state is 4^(|5) — e'T 11}) and the desired state can be 
obtained by applying Z. 

To see that this circuit performs the measurement M, let us consider what happens at each stage. 
The Hadamard transformation together with the six C not operations result in the state |0o> I> - 
The next eight gates perform M on the computational qubits controlled by the cat state, which 
results in the state 


1 

7 ^ 


(|0)® 7 |4> + |1}® 7 M|4». 


The six C no , result in the state 


1 

7^ 


(|0)® 7 |4> + |1)|0>® 6 M|4». 
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Figure 12.10 

~ —i — 

Fault-tolerant construction of |7 t/ 4), where R = e 4 and t/ = P»X. The first set constructs a cat state, the next set 
applies M controlled by the cat state, then the cat state construction is undone and the first qubit of the cat state register 
is measured. Either \n/A) or Px \—) is obtained. In the latter case, a Z operator could be applied to obtain |tt/4). 


The final Hadamard transformation results in the state 

^(l+>|0)® 6 |V f ) + | —}|0}® 6 M|l/r», 
which is equal to 

^=(|0>|0>® 6 (|iA> + M\\jr)) + |1)|0>® 6 (|^> - M\{[r))). 

Just as in section 11.4.3, one or the other of the eigenstates of M is obtained when the first qubit 
is measured in the standard basis. 

12.3 Robust Quantum Computation 

Section 12.3.1 describes concatenated coding that iteratively replaces a circuit with a larger and 
more robust one. That section also analyzes how many levels of concatenation are required to 
obtain a given accuracy to show that a polynomial increase in resources (qubits and gates) can 
achieve an exponential increase in accuracy. With these tools in hand, section 12.3.2 describes a 
threshold result. 
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12.3.1 Concatenated Coding 

Let Qo be a time-partitioned circuit (section 12.1) for a computation we wish to make robust. 
Suppose we want the computation to succeed with probability at least 1 — e. Let Q,+i be the 
time-partitioned circuit obtained by encoding each of the qubits of circuit Q , with Steane’s code, 
replacing all of the basic gates used in <2, with fault-tolerant logical equivalents, and perform¬ 
ing fault-tolerant error correction after the completion of the logical equivalent of each of Q ,’s 
time intervals. In other words, the circuit Q , is obtained from (Jo by i rounds of concatenated 
coding. Figure 12.11 schematically represents two levels of concatenated coding. In circuit g, 
there are i different levels of error correction: error correction on blocks of seven qubits are done 
most often, and error correction on blocks corresponding to the final logical qubits least often. 
This hierarchical application of error correction enables an exponential level of robustness to be 
achieved with only polynomially many resources, qubits and gates. 

This paragraph provides a rough heuristic argument for how polynomially many resources 
suffice to obtain exponential accuracy. When encoding qubits using an error correcting code, 
we can think of it roughly as having replaced parts that fail under a single-qubit error with an 
ensemble of parts that fails only in the presence of two or more errors. If, within a given time 
period, the probability of a part failing is p, the ensemble fails with probability cp 2 . Suppose a 
machine Mo that is composed of N parts, each of which fails with probability p in a single time 
interval, runs for T time intervals. The chance that M operates without fault is (1 — p) NT . Suppose 
a new machine, Mi, is created in which each of the N parts is replaced by K parts that together 
perform the operation of the original part, and that while each part still fails with probability p, 
the ensemble of K parts fails to perform the desired operation with probability only cp 2 for some 
constant c < 1 /p. The basic parts of machine M\ now can be replaced with ensembles of K 
parts. After continuing in this way i times, the hierarchical ensembles in machine M;, making the 
equivalent of a single part in machine Mo, fail to perform the desired operation with probability 
only c 2 ' ~ 1 p 2 ', so overall the machine M) succeeds with probability (1 — c 2 ' ~ 1 p 2 ' ) NT . The number 
of parts K ' in the hierarchical ensemble corresponding to a single part in machine Mo increases 
only exponentially in i, while the accuracy increases doubly exponentially in i. Thus, for the 
ensemble to achieve a failure rate of no more than (1 /2) r , we need encode only 0 (log 2 r) times: 
For any i > log 2 ), the failure rate is less that (l/2) r because 


i > log 2 


/ log 2 c — r \ 
V l°g 2 (cp) ) 


implies that 

2 « > r ~ lo S2 c 

- log 2 (cp)' 


The denominator — log 2 (cp) is positive, since cp < 1, so 
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Figure 12.11 

Schematic diagram with circuits Qq, Q i, and Q 2 showing two levels of concatenated coding. The number of qubits is 
only suggestive: for the Steane code, the circuit (?2 would use forty-nine qubits for each qubit of Qq. 















308 


12 Fault Tolerance and Robust Quantum Computing 


-2' log 2 (cp) > r — log 2 c , 
which implies 

-r > 2' log 2 (cp) - log 2 c = logo 
Thus, 


c 

2 i 

where (cp ^ — c~~ l p 2 ‘ is the failure rate we computed for an ensemble in machine Mi that 
replaces a single part in the original machine Mq. 

12.3.2 A Threshold Theorem 

This section begins by stating a threshold theorem and explaining the meaning of the concepts 
used in the statement of the theorem. It then briefly describes more general threshold theorems, 
and numerical estimates for thresholds obtained so far. 

A Threshold Theorem For any [[«, 1, 2r + 1]] quantum error correcting code that has a full set 
of fault-tolerant procedures, there exists a threshold p T with the following properties. For any 
e > 0, and any ideal circuit C, there exists a fault-tolerant circuit C' that, under local stochastic 
noise of error rate p < p T , produces output that is within e, in the statistical distance metric, from 
the output of C, and fewer than a\C\ qubits and time steps are used in C', where the factor a is 
polylogarithmic in , where |C| is the number of locations in C. 

An error correcting code has a full set of fault-tolerant procedures if it has fault-tolerant proce¬ 
dures for a set of universal logical gates, error correction steps, state preparation, and measurement. 
Suppose a circuit C has been divided into time steps in which each qubit is subjected to at most 
one preparation, gate, or measurement. A location in C is a gate, a preparation, a measurement, 
or a wait (the identity transformation storing the qubit for the next step). Afault-tolerantprotocol 
based on an error correcting code with a full set of fault-tolerant procedures replaces each loca¬ 
tion with a fault-tolerant procedure followed by an accompanying fault-tolerant error correcting 
procedure. The circuit C' in the threshold theorem is obtained by applying the fault-tolerant pro¬ 
tocol iteratively, in each stage replacing the gates, preparations, and measurements contained in 
the fault-tolerant procedures making up the circuit obtained in the previous round, as explained 
in section 12.3.1. The number of iterations that need to be carried out depends on the accuracy 
desired. 

In a local stochastic error model , the probability of errors at all locations in a set of locations 
during one time step decreases exponentially with the size of the set. More precisely, for every 
subset S of locations in a circuit C, the total probability of having faults at every location in S 
(and possibly outside S as well) is at most ]~[ ( p,, where p, is the fixed probability of having 
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a fault at location L -,. If the error probability p, for all locations is less than p, the error rate 
is said to be less than p. The type of error that occurs at these locations is not specified; for 
analysis, it is helpful to imagine that the error is chosen by an adversary who is trying to disrupt 
the computations as much as possible. Stochastic means that the locations of the faults are chosen 
randomly. 

The statistical distance e between two probability distributions P and Q , with probabilities p, 
and qj respectively for the N outcomes i, is the L 1 -distance 

N 

e = IIP—Gilt = £|a--9/I- 

i=l 

In our case, let P be the probability distribution over the measurement outcomes obtained if the 
ideal circuit were executed perfectly and the resulting state were measured in the standard basis. 
Let Q be the probability distribution obtained by applying C' under local stochastic noise of rate p 
and then measuring the logical qubits. Prior to measurement, the output state of C in the ideal case, 
and the output state of C' in the noisy case, can be written as density operators p and a respectively. 
The statistical distance between the distributions obtained by measuring p and a in the standard 
basis is the same as the trace distance between the Hermitian operators p and a. The trace distance 
or trace metric comes from the trace norm for Hermitian operators: the trace norm 11 A| \ Tr of A is 
defined as 11 A 11 7 ,. = tr | A | , where |A| = V At A is the positive square root of the operator A T A. Let 
p and p' be two density operators. The trace metric d Tr (p. p') on density operators is defined to be 

e = d Tr {p,p') = \\p-p'\\Tr = tr|p — p'\. 

Threshold theorems have been obtained for other error models, including more general models. 
For example, threshold theorems exist for error models in which each basic gate is replaced with 
one that interacts with the environment but remains close to the original gate. More precisely, 
each basic gate, acting perfectly, is modeled as U ® /, where U is the basic gate acting on the 
computational system and I is the identity acting on the environment. Each perfect gate has a 
noisy counterpart modeled by a unitary operator V acting on the computational system, together 
with the environment where V is constrained to be within i]t of (i®/, 

\\V-U®I\\ T r < nr 

for some threshold value qr ■ This noise model is quite general. In particular, it subsumes the local 
stochastic error model. 

Estimates for threshold values such as p-/ or have been obtained for a variety of codes, fault- 
tolerant procedures, and error models. Early results had thresholds on the order of q' — 10~ 7 , 
and these have been improved to q' — 10“ 3 . Further improvements are needed to reach q' = 
10~ 2 , a value that begins to be in reach of implementation experiments. Better understanding of 
realistic noise models, the development of more advanced error codes, and improved fault-tolerant 
techniques and analyses will improve these values. 
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12.4 References 

The first paper on fault tolerance was by Shor [252], Preskill [233] surveys issues and results 
related to fault tolerance. Both Aliferis [18] and Gottesman [138] give detailed, rigorous, but 
nevertheless highly readable accounts of fault tolerance, including threshold theorem proofs. 

Early threshold results were proved by Aharonov and Ben-Or [9, 10], Kitaev [173], and Knill, 
Laflamme, and Zurek [180, 181]. Improved error thresholds on the order of 10 5 have been 
found; see, for example, Steane [263], Knill [178], and Aliferis, Gottesman, and Preskill [19]. 
Threshold results have been found for a variety of noise models, including non-Markovian noise 
[268]. Threshold results have also been found for alternative models of computation [219, 220] 
such as cluster state quantum computing, which will be discussed in section 13.4. Steane [260] 
estimates realistic decoherence times. 

Steane [261] overviews a number of different universally approximating finite sets of gates 
from the point of view of fault-tolerant quantum computing. Eastin and Knill [108] showed that 
no code admits a universal transversal gate set. Cross, DiVincenzo, and Terhal [92] provide a 
comparison of many quantum error correcting codes from the point of view of fault tolerance. 

12.5 Exercises 

Exercise 12.1. Show that a single-qubit gate followed by a single-qubit error is equivalent to a 
(possibly different) single-qubit error followed by the same gate. 

Exercise 12.2. Why do we consider the preparation of the cat state in figure 12.6 to be fault- 
tolerant, even though it includes two C not gates from the qubit on which the Z measurement is 
made? 

Exercise 12.3. What effect does applying P| to each of the qubits in the Steane seven-qubit 
encoding have? 

Exercise 12.4. Show that the transversal circuit shown in figure 12.8 does not implement the 
Toffoli gate. Consider the effect of the circuit on |1) ® |0) ® |0). 

Exercise 12.5. Design a fault-tolerant version of the Toffoli gate for the Steane code. 
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Further Topics in Quantum Information Processing 


This chapter gives brief overviews of topics that we were not able to discuss fully. Section 13.1 
surveys more recent results in quantum algorithms. Known limitations of quantum computation 
are discussed in section 13.2. Other approaches to robust quantum computation, as well as a few 
of the many advances in quantum error correction, are described in section 13.3. Section 13.4 
briefly describes alternative models of quantum computation, including cluster state quantum 
computation, adiabatic quantum computation, holonomic quantum computation, and topological 
quantum computation, and their implications for quantum algorithms, robustness, and approaches 
to building quantum computers. Section 13.5 makes a quick tour of the extensive area of quantum 
cryptography, and touches upon quantum games, quantum interactive protocols, and quantum 
information theory. Insights from quantum information processing that led to breakthroughs in 
classical computer sciences are discussed in section 13.6. 

Section 13.7 briefly surveys approaches to building quantum computers, starting with criteria 
for scalable quantum computers. This discussion leads into the consideration of simulations of 
quantum systems in section 13.8. Section 13.9 discusses the still poorly understood question of 
where the power of quantum computation comes from, with an emphasis on the status of entan¬ 
glement. Finally, section 13.10 discusses computation in theoretical variants of quantum theory. 

This overview is not meant to be complete. In an area advancing as quickly as this one, 
there are new results every day. Exploring the quantum physics section of the e-print archive 
(http://arXiv.org/archive/quant-ph) is an excellent way to discover additional topics and to keep 
up with the latest developments in the field (but be aware that the papers there are not refereed). 

13.1 Further Quantum Algorithms 

After Grover’s algorithm, there was a hiatus of more than five years before a significantly different 
algorithm was found. The field advanced during this time, with researchers finding variants on 
the techniques of Shor and Grover to provide algorithms for a wider range of problems, but no 
algorithmic breakthroughs occurred. Grover and others extended his techniques to provide small 
speedups for a number of problems, as mentioned in section 9.6. Shor’s algorithms were extended 
to provide solutions to the hidden subgroup problem over a variety of non-Abelian groups that are 
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close to being Abelian [244, 162, 161, 29], including a solution for the hidden subgroup problem 
for normal subgroups of arbitrary finite groups [147, 141] and groups that are almost Abelian in 
the sense that the intersection of the normalizers for all subgroups is large [141 ]. On the negative 
side, Grigni et al. [141] showed in 2001 that for most non-Abelian groups and their subgroups, the 
standard Fourier sampling method used by Shor and successors could yield only exponentially 
little information about the hidden subgroup. On the other hand, Ettinger et al. [114] showed in 
2004 that there is no information theoretic barrier to solving this problem; they showed that the 
query complexity of the general non-Abelian hidden subgroup problem is polynomial. 

Most researchers expect that quantum computers cannot solve NP-complete problems in poly¬ 
nomial time. There is no proof (a proof would imply P ^ NP).As section 13.10 discusses in more 
detail, Aaronson goes so far as to suggest that this limit on computational power be viewed as a 
principle governing any reasonable physical theory capable of describing our universe. A lot of 
focus has been given to candidate NP-intermediateproblems, problems that are in NP, not in P, and 
are not NP complete. Ladner’s theorem says that if P ^ NP, then there exist NP intermediate prob¬ 
lems. Factoring and the discrete logarithm problem are both candidate NP-intermediate problems. 
Other candidate problems include graph isomorphism, the gap shortest lattice vector problem, 
and many hidden subgroup problems [254, 13]. While polynomial time quantum algorithms have 
been found for a few hidden subgroup problems, particularly cases that are close to Abelian, these 
problems remain some of the most important open questions in the field of quantum computation. 

Two special cases of the hidden subgroup problem have received the most attention: the sym¬ 
metric group S„, the full permutation group of n elements, and the dihedral group D n , the group of 
symmetries of a regular n -sided polygon. An early result of Beals [34] provided a quantum Fourier 
transform for the symmetric group, but a solution to the hidden subgroup problem for the sym¬ 
metric group continues to elude researchers. This problem is of particular interest since a solution 
would yield a solution to the graph isomorphism problem. The hidden subgroup problem for the 
dihedral group attracted even more attention when Regev [237] showed in 2002 that any efficient 
algorithm to the dihedral hidden subgroup problem that uses Fourier sampling, a generalization of 
Shor’s technique, would enable the construction of an efficient algorithm for the gap shortest vec¬ 
tor problem, a problem of cryptographic interest. In 2003, Kuperberg found a subexponential (but 
still superpolynomial) algorithm for the dihedral group [189], which Regev improved upon by 
reducing the space requirements to polynomial while retaining the subexponential time complex¬ 
ity [239]. Alagic et al. have extended these techniques to a solution of Simon’s problem for general 
non-Abelian groups [17]. Lamont surveys hidden subgroup results and techniques in [191]. 

In 2002, Hallgren found an efficient quantum algorithm for solving Pell’s equation [146]. Solv¬ 
ing Pell’s equation is believed to be harder than factoring or the discrete log problem. The security 
of the Buchmann-Williams classical key exchange and the Buchmann-Williams public key crypto¬ 
system is based on the difficulty of solving Pell’s equation. So even the Buchmann-Williams public 
key cryptosystem, which was believed to have a stronger security guarantee than standard public 
key encryption algorithms, is now known to be insecure in a world with quantum computers. In 
2003, van Dam, Hallgren, and Ip found an efficient quantum algorithm for the shifted Legendre 
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symbol problem [272]. The shifted Legendre symbol problem is the basis for the security of some 
algebraically homomorphic cryptosystems that are used, for example, in certain cryptographic- 
grade random number generators. The existence of van Dam et al.’s algorithm means that quan¬ 
tum computers can predict these random number generators, thus rendering them insecure. In 
2007, Farhi, Goldstone, and Gutmann [115] found a quantum algorithm for evaluating NAND 
trees in 0(*/N), a problem that had puzzled quantum computing researchers for many years. 

In the past five years, a new family of quantum algorithms has been discovered that uses tech¬ 
niques of quantum random walks to solve a variety of problems. Childs et al. [80] solve a black 
box graph traversal problem in polynomial time that cannot be solved in subexponential time 
classically. Magniez et al. [201] prove a Grover-type speedup result for a different graph problem 
using a quantum random walk approach. Magniez and Nayak [200] apply quantum random walks 
to the problem of testing commutativity of a group, Buhrman and Spalek [75] to matrix product 
verification, and Ambainis [23] to element distinctness. Krovi and Brun [186] study hitting times 
of quantum walks on quotient graphs. Both Ambainis [22] and Kempe [169] provide overviews 
of quantum walks and quantum walk-based algorithms. 

Quantum learning theory [70, 246, 132, 160, 27] provides a conceptual framework that unites 
Shor’s algorithm and Grover’s algorithm. Quantum learning is part of computational learning 
theory that is concerned with concept learning. A concept is modeled by a membership function, 
a Boolean function c : [0, 1}" —> [0, 1}. Let C — [c ( ] be a class of concepts. Generally, a quan¬ 
tum learning problem involves querying an oracle O c for one of the concepts c in C, and the 
job is to discover the concept c. The types of oracles vary. A common one is a membership 
oracle, which upon input of x outputs c(x). Common models include exact learning and probably 
approximately correct (PAC) learning. In the quantum case, oracles output a superposition upon 
input of a superposition of inputs. Servedio and Gortler [132] establish a negative result, that the 
number of classical and quantum queries required for any concept class does not differ by more 
than a polynomial in either the exact or the PAC model. On the positive side, the same paper 
shows that for computational efficiency, rather than query complexity, the story is quite different. 
In the exact model, the existence of any classical one-way function guarantees the existence of a 
concept class that is polynomial-time learnable in the quantum case but not in the classical. For 
the PAC model, a slightly weaker result is known in terms of a particular one-way function. 

13.2 Limitations of Quantum Computing 

Beals and colleagues [35] proved that, for a broad class of problems, quantum computation can 
provide at most a small polynomial speedup. Their proof established lower bounds on the number 
of time steps any quantum algorithm must use to solve these problems. Their methods were used 
by others to provide lower bounds for other types of problems. Ambainis [21] found another 
powerful method for establishing lower bounds. 

In 2002, Aaronson answered negatively the question of whether there could be efficient quantum 
algorithms for the collision problem [1], His results were generalized by Shi and himself [248, 6]. 
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This result was of great interest because it showed that there could not exist a generic quantum 
attack on all cryptographic hash functions. Aaronson’s result says that any attack must use specific 
properties of the hash function under consideration. Shor’s algorithms break some cryptographic 
hash functions, and quantum attacks on others may yet be discovered. 

Section 9.3 showed that Grover’s search algorithm is optimal. In 1999, Ambainis [20], building 
on work of Buhrman and de Wolf [73] and Farhi, Goldstone, Gutmann, and Sipser [117], showed 
that for searching an ordered list, quantum computation can give no more than a constant factor 
improvement over the best possible classical algorithms. Childs and colleagues [81, 82] improved 
estimates for this constant. Aaronson [5] provides a high-level overview of the limits of quantum 
computation. 

13.3 Further Techniques for Robust Quantum Computation 

While quantum error correction is one of the most advanced areas of quantum information pro¬ 
cessing, many open questions remain. As more quantum information processing devices are built, 
finding quantum codes or other robustness methods optimized for the particular errors to which 
the devices are most vulnerable will remain a rich area of research. 

For transmitting quantum information, either as part of quantum communication protocols or 
to move information around inside a quantum computer, not only are efficient error detection 
and the trade-off between data expansion and strength of the code important, but the decoding 
efficiency is as well. One longtime frustration has been the difficulty of using certain classical 
codes with efficient decoding properties, such as low-density parity check (LDPC) codes, as the 
basis for constructing quantum codes with similarly efficient decoding. The duality constraint in 
the CSS code construction was too much of a barrier for these codes, and no one knew what else 
to do. In 2006, Brun, Devetak, and Hsieh realized that by using a side resource of entanglement 
between the sender and the receiver, quantum versions of many more classical codes, including 
LDPC codes, could be obtained [68, 137]. This construction may also be useful beyond quantum 
communication. 

Instead of encoding the states so that we can detect and correct common errors, we may be able 
to place the states in subspaces unaffected by these errors. Such approaches, complementary to 
the error correcting codes we have seen, go under the various headings of error avoiding codes, 
noiseless quantum computation, or, most commonly, decoherence-free subspaces. Under certain 
conditions, we expect a system to be subject to systematic errors affecting all the qubits of the 
system. The quantum codes we have seen, while effective on errors involving small numbers 
of qubits, are not effective on systematic errors affecting all qubits. Lidar and Whaley provide 
a detailed review of decoherence-free subspaces in [195]. Operator error correction [184, 185] 
provides a framework that unifies quantum error correcting codes and decoherence-free subspaces. 
Quantum computers built according the topological model of quantum computation (described 
in section 13.4.4) would have robustness built in from the start. 

Here we give a few simple examples to illustrate the general approach of decoherence-free 
subspaces. 
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Example 13.3.1 Systematic bit-flip errors. Suppose a system tends to be subject to errors that 
perform a quantum bit flip on all qubits of the system. Bit-flip errors have no effect on the states 
1++) and |-> (or any linear combination of them). For example, a bit-flip error takes 

I—) = |(|0> - |1))(|0> - |1» 

to 

|(|1> - |0»(|1) - |0» = |—). 

If we encode every 10) and |1) as two qubit states |++) and |-) respectively, we will have 

succeeded in protecting our computational states from all systematic bit-flip errors by embedding 
them in states of a 2n -qubit system that are immune from these errors. 


Example 13.3.2 Systematic phase errors. Suppose a system tends to be subject to errors that 
perform the same relative phase shift E — |0)(0| + e^\ 1) (11 on all qubits of the system. If we 
encode each single-qubit state |0) and 11) as the two qubit states 

Wo) = 4=(|01} + |10» 

a/2 


|th>= —(|01> - |10)), 

>/2 

the error becomes a physically irrelevant global phase, so the computational states are entirely 
protected from these errors: 

(E ® is)— t=(| 01) ± 110)) = -U|O>®c^|l>±c^|l}0|O» 

V2 V2 

= e*4=(|01)±|10» 

a/2 


~ -=(| 01 >±| 10 ». 

a/2 

Thus, the two-dimensional space spanned by {| tJ/q) , | xjr i}} can be used as a binary quantum system 
that is error-free within an environment that produces only systematic relative phase errors. 
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We would like to combine these approaches to obtain an encoding that protects against all 
systematic qubit errors. A subspace that is immune to both systematic X and Z errors is certainly 
immune to Y — ZX errors, and therefore to any systematic single-qubit error, since it is immune 
to any linear combination of these errors. The following example is due to Zanardi and Rasetti 
[292]. This example was designed for the error environment of qubits encoded in photon 
polarization as affected by a quartz crystal. This decoherence-free subspace method has been 
experimentally verified by Kwiat et al. [190]. 


Example 13.3.3 Systematic single-qubit errors. The reader can check that all quantum states 
represented by the elements of the two-dimensional space spanned by the vectors 

\(Po) = ifllOOl)-|0101) +|0110)-|1010» 

I (Pi) = j(|1001) - 10011) + |0110) — 11100)) 

are left invariant by systematic X and Z errors. Since |<^o} and \cp{) are not orthogonal, we 
cannot encode |0) and |1) as these two vectors. By using the Gram-Schmidt process, we can 
find orthonormal vectors: we can replace \<p\) with | <p[), the normalized component of \<p{) 
perpendicular to \tpo), by taking \q> 2 ) = \<p\) — (^ol^i)l^o), and then normalizing to obtain 

^ J{<P2\<Pi) 

which by construction is perpendicular to \tpo). By encoding all |0) and |1) as \tpo) and \tp\), we 
can protect against all systematic X and Z errors and therefore against all systematic single-qubit 
errors. Thus, by embedding the states of an n-qubit system in the states of a 4n-qubit system, we 
have obtained a computational subspace immune to all systematic single-qubit errors. 


Decoherence-free subspace approaches have been developed for a variety of complex situations. 
See [195] for a survey. 

13.4 Alternatives to the Circuit Model of Quantum Computation 

The circuit model of quantum computing of section 5.6 is well designed for comparisons between 
quantum algorithms and classical algorithms. We have seen its use in comparing the efficiency of 
quantum algorithms to classical algorithms and for showing that any classical computation can 
be done on a quantum computer in comparable time. Other models rival the circuit model for 
inspiring the discovery of new quantum algorithms or for giving insight into the limitations of 
quantum computation. Furthermore, other models better support certain promising approaches 
toward ways of physically realizing quantum computers and understanding the robustness of these 
implementations. 
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Two significant alternatives to the circuit model have been developed so far: cluster state 
quantum computing and adiabatic quantum computing. The next four subsections briefly describe 
these two models and their applications, holonomic quantum computation, a hybrid of adiabatic 
quantum computing and the standard circuit model, and topological quantum computing, which 
is related to holonomic quantum computation. 

13.4.1 Measurement-Based Cluster State Quantum Computation 

The elegant cluster state model of quantum computation [235, 218] makes exceptionally clear 
use of quantum entanglement, quantum measurement, and classical processing. In contrast to the 
standard circuit model, cluster state quantum computation makes no use of unitary operations 
in its processing of information; all computations are accomplished by measurement of qubits 
in a cluster state, the maximally connected, highly persistent entangled states of section 10.2.4. 
In a cluster state algorithm, the order in which the qubits are measured is set; only the basis in 
which each of the qubits is measured is determined by the results of previous measurements. 

The initial cluster state is independent of the algorithm to be performed; it depends only on the 
size of the problem to be solved. All of the processing, including input and output, takes place 
entirely by a series of single-qubit measurements, so the entanglement between the qubits can 
only decrease in the course of the algorithm. For this reason, cluster state quantum computation is 
sometimes called one-way quantum computation. In cluster state quantum computation, the entan¬ 
glement creation and the computational stages of a quantum computation are cleanly separated. 

Cluster state quantum computation has been shown to be computationally equivalent to the 
standard circuit model of quantum computation. Cluster states, therefore, provide a universal 
entanglement resource for quantum computation. The proof of computational equivalence relies 
on a mapping of the time sequence of quantum gates in a quantum circuit to a spatial dimension of 
the 2-D lattice in which the cluster state lives. The processing proceeds from left to right, with the 
input placed in states on the far left of the cluster, and the output appearing in the states on the far 
right of the cluster once the algorithm is complete. A single qubit in the quantum circuit model is 
mapped to a row of qubits in the cluster state; thus, the single qubits of the cluster state are distinct 
from the logical qubits being processed by the computation. Many qubits in the cluster are not 
associated with any qubit in the circuit model. These qubits connect the qubit rows and, together 
with measurement, enable quantum gates to be carried out as the qubits of the cluster are measured 
from left to right. The measurements of qubits in a single column can be carried out in parallel. 

General cluster state computations use more general structures than those arising as analogs 
of quantum circuits. For example, the measurements do not necessarily proceed from left to 
right, and rows in the cluster state may have no obvious meaning. There is no reason for them 
to represent a logical qubit, a concept from the circuit model of quantum computation that does 
not have an analog in general cluster state quantum computation. Any computation in the cluster 

state model partitions the cluster into sets of qubits Q\ , (A. Ql- The qubits within a set can 

be measured in any order; in particular, they may be measured in parallel. All qubits in the set Q , 
must be measured before any qubit of Q, + \ is measured. How a qubit in Qj+\ is measured may 
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depend on the results of measurements of qubits in Q\, Qi, ■ ■ ■, Ql- The interpretation of the 
final result may depend on measurements results obtained at earlier states. Raussendorf, Browne, 
and Briegel [235] define the logical depth of a computation to be the minimum number of sets Qj 
needed to carry out the computation. Both the interpretation of the final results and the decision 
of what basis to use for a measurement given the previous results require classical computation 
that must be taken into account in terms of the efficiency of the algorithm. 

For some computations, the logical depth is surprisingly low. For example, take any quantum 
circuit consisting entirely of elements of the Clifford group, the group generated by the C„ ot , 
Hadamard, and jr/2-phase shift gates. While the corresponding computation in the cluster state 
model proceeds by measuring columns of qubits from right to left, it turns out that for all cluster 
computations corresponding to Clifford group circuits, one can simply measure all the qubits at 
once. Thus, the logical depth of computations using only Clifford gates is 1; there are no depen¬ 
dencies between the measurements needed to accomplish the computation. This result implies 
that the only computation going on is the classical interpretation of the results and determination 
of intermediate measurements. Thus, a quantum circuit consisting of entirely of Clifford gates has 
a classical analog of equivalent efficiency. This result, known as the Gottesman-Knill theorem 
[ 133], is not trivial in that, for example, the Walsh-FIadamard transformation is contained in the 
Clifford group. The cluster state model provides a particularly simple proof of this theorem. 

The cluster state model is of great theoretical interest since it clarifies the role of entanglement in 
quantum computation and provides means of analyzing quantum computation. It has also had sub¬ 
stantial impact on approaches to building quantum computers, particularly optical quantum com¬ 
puters. It will be discussed again in section 13.7 in that context. Furthermore, as will be discussed 
in section 13.9, it has clarified the role of entanglement in quantum competition in surprising 
ways. 

13.4.2 Adiabatic Quantum Computation 

To describe adiabatic quantum computation, we must first describe the Hamiltonian framework for 
quantum mechanics on which it rests. Quantum systems evolve by unitary operators, so the state 
of any system, initially in state | 'Ho), as it evolves over time t can be described by |'F,} = f/,|'Fo)> 
where U t is a unitary operator for each 1. Furthermore, the evolution must be continuous and 
additive: U tl + t2 — £4, U n for all times t\ and t 2 . Any unitary operator U can be written U — e~' H 
for some Hermitian H. Any continuous and additive family of unitary operators can be written as 
U, — e " H for some Hermitian operator H called the Hamiltonian for the system. Schrodinger’s 
equation provides an equivalent formulation: the Hamiltonian H must satisfy 

dt 

using units in which Planck’s constant 4=1. Let Xq be the smallest eigenvalue of H. Any /. ( )- 
eigenstate of H is called a ground state of H. The Hamiltonian framework and Schrodinger’s 
equation can be found in any quantum mechanics book. 
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To solve a problem using adiabatic quantum computation, an appropriate Hamiltonian H\ must 
be found, one for which a solution to the problem can be represented as the ground state of the 
Hamiltonian. An adiabatic algorithm begins with the system in the ground state of a known and 
easily implementable Hamiltonian Hq. A path H, is chosen between the initial Hamiltonian 
and the final Hamiltonian H = H\, and the Hamiltonian is gradually perturbed to follow this 
path. The theory of adiabatic quantum computation rests on the adiabatic theorem [210], which 
says that as long as the path is traversed slowly enough the system will remain in the ground state, 
and thus at the end of computation it will be in the solutions state, the ground state of H \. How 
slowly the path must be traversed depends on the eigengap, the difference between the two lowest 
eigenvalues. In general, it is hard to obtain bounds on this gap, so the art of designing an adiabatic 
algorithm is first in finding a mapping of the problem to an appropriate Hamiltonian, and then in 
finding a short path for which one can show that the eigengap never becomes too narrow. 

Adiabatic quantum computation was introduced by Farhi, Goldstone, Gutmann, and Sipser 
[118]. Childs, Farhi, and Preskill [79] show that adiabatic quantum computation has some inherent 
protection against decoherence, which means that it may be a particularly good model both for 
designing robust implementations of quantum computers and robust algorithms [79]. Roland 
and Cerf [243] show how to recapture Grover’s algorithm, and the optimality proof, within the 
adiabatic context. 

Aharonov et al. [16] develop a model for adiabatic quantum computation and prove that it is 
computationally equivalent to universal quantum computation in the circuit model. Other models 
of adiabatic computation exist. Some are equivalent in power only to classical computation [63], 
while for others, the extent of their power in not yet understood. This situation complicates not 
only discussions of adiabatic quantum computing but also implementation efforts. For example, 
some small adiabatic devices have been built for which it has not been possible to determine 
whether they perform universal quantum computation or not. 

Aharonov and Ta-Shma’s wide-ranging paper [15], after developing tools for adiabatic quantum 
computation, investigates the use of adiabatic models for understanding which states, particularly 
superpositions of states drawn from probability distributions, can be efficiently generated. Initial 
interest centered on the possibility of using adiabatic methods to develop a quantum algorithm 
to solve NP-complete problems [116, 78, 153], because adiabatic algorithms were not subject to 
the lower bound results proven for other approaches. Vazirani and van Dam [273] and Reichardt 
[240] rule out a variety of adiabatic approaches to solving NP-complete problems in polynomial 
time, 

13.4.3 Holonomic Quantum Computation 

Holonomic , or geometric , quantum computation [293, 76] is a hybrid between adiabatic quantum 
computation and the standard circuit model in which the quantum gates are implemented via adi¬ 
abatic processes. Holonomic quantum computation makes use of non-Abelian geometric phases 
that arise from perturbing a Hamiltonian adiabatically along a loop in its parameter space. The 
phases depend only on topological properties of the loop, and so are insensitive to perturbations. 
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This property means that holonomic quantum computation has good robustness with respect to 
errors in the control driving the Hamiltonian’s evolution. Early experimental efforts have been 
carried out using a variety of underlying hardware. 

13.4.4 Topological Quantum Computation 

In 1997, prior to the development of holonomic quantum computation, Kitaev proposed topo¬ 
logical quantum computing, a more speculative approach to quantum computing that also has 
excellent robustness properties [174, 125, 233, 87]. Kitaev recognized that topological properties 
are totally unaffected by small perturbations, so encoding quantum information in topological 
properties would give intrinsic robustness. The type of topological quantum computing Kitaev 
proposed makes use of the Aharonov-Bohm effect, in which a particle that travels around a 
solenoid acquires a phase that depends only on how many times it has encircled the solenoid. 
This topological property is highly insensitive to even large disturbances in the particle’s path. 

Kitaev defined quantum computation in this model and showed that, by using non-Abelian 
Aharonov-Bohm effects, such a quantum computer would be universal in the sense of being able 
to simulate computations in the quantum circuit model without a significant loss of efficiency. 
However, only a few non-Abelian Aharonov-Bohm effects have been found in nature, and all 
of these are unsuitable for quantum computation. Researchers are working to engineer such 
effects, but even the most basic building blocks of topological quantum computation have yet 
to be realized experimentally in the laboratory. In the long term, the robustness properties of 
topological quantum computing may enable it to win out over other approaches. In the meantime, 
it is of significant theoretical interest. For example, it led to a novel type of quantum algorithm 
that provides a polynomial time approximation of the Jones polynomial [11]. 

13.5 Quantum Protocols 

The most famous quantum protocols are quantum key distribution schemes, such as those of 
sections 2.4 and 3.4. Quantum key distribution was the first example of a quantum cryptographic 
protocol. Since then, quantum approaches to a wide variety of cryptographic and communication 
tasks have been developed. 

Some quantum cryptographic protocols, such as the quantum key distribution schemes we 
described, use quantum means to secure classical information. Others secure quantum infor¬ 
mation. Many are unconditionally secure in that their security is based entirely on properties 
of quantum mechanics. Others are only quantum computationally secure in that their security 
depends on a problem being computationally intractable for quantum computers. For example, 
unconditionally secure bit commitment is known to be impossible to achieve through either clas¬ 
sical or quantum means [205, 197, 93]. Weaker forms of bit commitment exist. In particular, 
quantum computationally secure bit commitments schemes exist as long as there exist quan¬ 
tum one-way functions [8, 106]. Kashefi and Kerenidis discuss the status of quantum one-way 
functions [168]. 
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Closely related to quantum key distribution schemes are protocols for unclonable encryption 
[136]. Uncloneable encryption is a symmetric key encryption scheme that guarantees that an 
eavesdropper cannot even copy an encrypted message, say for later attempts at decryption, with¬ 
out being detected. In addition to providing a stronger security guarantee than most symmetric key 
encryption systems, the keys can be reused as long as eavesdropping is not detected. Uncloneable 
encryption has strong ties with quantum authentication [33]. One type of authentication is dig¬ 
ital signatures. Shor’s algorithms break all standard digital signature schemes. Quantum digital 
signature schemes have been developed [139], but the keys involved can be used only a limited 
number of times. In this respect they resemble classical schemes such as Merkle’s one-time digital 
signature scheme [207], 

Some quantum secret sharing protocols protect classical information in the presence of eaves¬ 
droppers [151]. Others protect a quantum secret. Cleve et al. [86] provide quantum protocols 
for ( k , n) threshold quantum secrets. Gottesman et al. [134] provide protocols for more general 
quantum secret sharing. There is a strong tie between quantum secret sharing and CSS quantum 
error correcting codes. Quantum multiparty function evaluation schemes exist [91]. 

Fingerprinting is a mechanism for identifying strings such that equality of two strings can be 
determined with high probability by comparing their respective fingerprints. It has been shown 
that classical fingerprints for bit strings of length n need to be of at least length O i^/n ). Buhrman 
et al. [72] show that a quantum fingerprint of classical data can be exponentially smaller; they 
can be constructed with only (9 (log(7;)) qubits. 

In 2005, Watrous [280] was able to show that many classical zero-knowledge interactive 
protocols are zero knowledge against a quantum adversary. A significant part of the challenge was 
to find a reasonable and sufficiently general definition of quantum zero knowledge. The problems 
on which statistical zero-knowledge protocols are generally based are candidate NP-intermediate 
problems such as graph isomorphism, so for this reason also zero-knowledge protocols are of 
interest for quantum computation. Aharonov and Ta-Shma [15] detail intriguing connections 
between statistical zero-knowledge and adiabatic state generation. 

There is a close connection between quantum interactive protocols and quantum games. An 
introduction to this field is provided by [192], Early work in this area includes a discussion of a 
quantum version of the prisoner’s dilemma [110]. See Meyer [212] for a lively discussion of other 
quantum games. Gutoski and Watrous [ 145] tie quantum games to quantum interactive proofs. 

13.6 Insight into Classical Computation 

A number of classical algorithmic results have been obtained by taking a quantum information 
processing viewpoint. Kerenidis andde Wolf [170] and Wehneretal. [282] use quantum arguments 
to prove lower bounds for locally decodable codes, Aaronson [2] for local search, Popescu 
et al. [230] for the number of gates needed for a classical reversible circuit, and de Wolf [98] 
for matrix rigidity. Aharonov and Regev [ 14] “dequantize" a quantum complexity result for a 
lattice problems to obtain a related classical result. The usefulness of the complex perspective 
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for evaluating real valued integrals is sometimes used as an analogy to explain this phenomenon. 
Drucker and de Wolf survey these and other results in [105]. We know of two additional examples 
that were not included in their survey. One is the Gentry result [ 127] discussed at the end of the 
next paragraph. Another is an early example due to Kuperberg, his proof of Johansson’s theorem 
[187]. We describe a couple of examples in greater detail. 

Cryptographic protocols usually rely on the empirical hardness of a problem for their security; 
it is rare to be able to prove complete, information theoretic security. When a cryptographic 
protocol is designed based on a new problem, the difficulty of the problem must be established 
before the security of the protocol can be understood. Empirical testing of a problem takes a long 
time. Instead, whenever possible, reduction proofs are given that show that if the new problem 
were solved it would imply a solution to a known hard problem; the proofs show that the solution 
to the known problem can be reduced to a solution of the new problem. Regev [238] designed a 
novel, purely classical cryptographic system based on a certain problem. He was able to reduce 
a known hard problem to this problem, but only by using a quantum step as part of the reduction 
proof. Thus, he has shown that if the new problem is efficiently solvable in any way, there is an 
efficient quantum algorithm for the old problem. But it says nothing about whether there would 
be a classical algorithm. This result is of practical importance; his new cryptographic algorithm is 
a more efficient lattice-based public key encryption system. Lattice-based systems are currently 
the leading candidate for public key systems secure against quantum attacks. Four years after 
Regev’s original result, Peikert provided a completely classical reduction [224], At the same 
conference, however. Gentry presented his spectacular result, a fully homomorphic encryption 
system [ 128], answering a thirty-year open question. As part of his work, he uses a related, but 
different, quantum reduction argument for an otherwise completely classical result [127]. 

In another spectacular, if less practical, result, Aaronson found a new solution to a notorious 
conjecture about a purely classical complexity class PP [4], From 1972 until 1995, this question 
remained open. Aaronson defines a new quantum complexity class PostBQP, an extension of 
the standard quantum complexity class BQP, motivated by the use of postselection in certain 
quantum arguments. It takes him a page to show that PostBQP=PP, and then only three lines to 
prove the conjecture. The original 1995 proof, while entirely classical, was significantly more 
complicated. Thus, it seems, for this question at least, the right way to view the classical class PP 
is through the eyes of quantum information processing. 

13.7 Building Quantum Computers 

DiVincenzo developed widely used requirements for the building of a quantum computer. Obtain¬ 
ing n qubits does not suffice, just like n bits, say n light switches, does not make a classical 
computer; the bits or qubits must interact in a controllable fashion. It is relatively easy to obtain 
n qubits, but it is hard to get them to interact with each other and with control devices, while 
preventing them from interacting with anything else. DiVincenzo’s criteria [104] are, roughly: 
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• Scalable physical system with well-characterized qubits, 

• Ability to initialize the qubits in a simple state, 

• Robustness to environmental noise: long decoherence times, much longer than the gate 
operation time, 

• Ability to realize high fidelity universal quantum gates, 

• High-efficiency, qubit-specific measurements. 

Two other criteria were added later in recognition of the need for flying qubits used to transmit 
information between different parts of a quantum computer: 

• Ability to interconvert stationary and flying qubits, 

• Faithful transmission of flying qubits between specified locations. 

DiVincenzo’s criteria are rooted in the standard circuit model of quantum computation. Perez- 
Delgado and Kok [227] give more general criteria, including formal operational definitions of a 
quantum computer, that are meant to encompass alternative models of quantum computation. 

There are daunting technical difficulties in actually building such a machine. Research teams 
around the world are actively studying ways to build practical quantum computers. The field is 
changing rapidly. It is impossible even for experts to predict which of the many approaches are 
likely to succeed. Both [295] and [157] contain detailed evaluations of the various approaches. 
No one has yet made a detailed proposal that meets all of the Di Vincenzo criteria, let alone realize 
it in a laboratory. A breakthrough will be needed to go beyond tens of qubits to hundreds of qubits. 

The earliest small quantum computers [176] used liquid NMR [129]. NMR technology was 
already highly advanced due to its use in medicine. The NMR approach uses the nuclear spin 
state of atoms. Many copies of one molecule are contained in a macroscopic amount of liquid. 
A quantum bit is encoded in the average spin state of a large number of nuclei. Each qubit 
corresponds to a particular atom of the molecule, so the atoms for one qubit can be distinguished 
from those of other qubits by their nuclei’s characteristic frequency. The spin states can be 
manipulated by magnetic fields, and the average spin state can be measured with NMR techniques. 
NMR quantum computers work at room temperature. However, liquid NMR has severe scaling 
problems—the measured signal scales as 1/2" with the number of qubits n —so liquid NMR 
appears unlikely to lead implementation efforts much longer, let alone achieve a scalable quantum 
computer. 

As an example of how hard it is to predict which approaches are most likely to lead to a scalable 
quantum computer, in 2000 optical approaches were considered unpromising. Optical methods 
were recognized as the unrivaled approach for quantum communications applications such as 
quantum key distribution, and also as flying qubits sending information between different parts 
of a quantum computer, because photons do not interact much with other things and so have long 
decoherence times. This same trait, however, means that it is difficult to get photons to interact with 
each other, which made them appear unsuitable as the fundamental qubits on which computation 
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would be done. While nonlinear optical materials induce some photon-photon interactions, no 
known material has a strong enough nonlinearity to act as a C no , gate, and scientists doubt that such 
a material will ever be found. Knill, Laflamme, and Milburn’s 2001 paper [179] showed how, by 
clever use of measurement, C not gates could be achieved, avoiding the issue of nonlinear optical 
elements altogether. While this result, known as the KLM approach, was a huge breakthrough 
for the field of optical quantum computing, major difficulties remained. The overhead required 
by these methods was enormous. In 2004, Nielsen showed how this overhead could be greatly 
reduced by combining the KLM approach with cluster state quantum computing. O’Brien [222] 
gives a brief but insightful overview of optical approaches to quantum computers, now viewed 
as one of the more promising approaches in spite of the many hurdles that remain. 

Ion trap approaches are currently the most advanced approach that appear possibly scalable. 
The field has made steady progress. In an ion trap quantum computer [84, 258], individual 
ions, each representing a qubit, are confined by electric fields. Lasers are directed at individual 
ions to perform single-qubit quantum gates and two-qubit operations between adjacent ions. All 
operations necessary for quantum computation have been demonstrated in the laboratory for small 
numbers of ions. To scale this technology, proposed architectures include quantum memory and 
processing elements where qubits are moved back and forth either through physical movement 
of the ions [171] or by using photons to transfer their state [262]. More recently, architectural 
designs for quantum computers have begun to be studied. Van Meter and Oskin [211] survey 
architectural issues and approaches for quantum computers. 

Many other approaches exist, including cavity QED, neutral atom, and various solid state 
approaches. See [295] and [157] for descriptions of these approaches, their experimental status at 
the time the reports were written, and their perceived strengths and weaknesses. Hybrid approaches 
are also being pursued. Of particular interest are interfaces between optical qubits and qubits in 
some of these other forms. 

Once a quantum information processing device is built, it must be tested to determine if it works 
as expected and to learn what sorts of errors occur. Finding good, efficient methods of testing is a 
far from trivial task, given the exponentially large state space, and that measurement affects the 
state. Quantum state tomography studies methods for experimentally characterizing a quantum 
state by examining multiple copies of the state. Quantum process tomography aims to characterize 
experimentally sequences of operations performed by a device. Early work includes Poyatos et al. 
[231,232] and Chuang and Nielsen [83]. D’Ariano et al. provide a review of quantum tomography 
[94], While a full characterization of an n -qubit system requires exponentially many probes of 
the system, some features can be determined with less. Of particular interest is determining the 
decoherence to which a process is subjected. A recent breakthrough by Emerson et al. provides a 
symmetrization process that reduces the number of probes needed to characterize the decoherence 
to only polynomially many [113, 28]. 

The efforts and success in creating highly entangled states for use in quantum information 
processing devices have found a number of other applications, and they have enabled deeper 
experimental exploration of quantum mechanics [157, 295]. Highly entangled states, and the 
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improvements in quantum control, have been used in quantum microlithography to affect matter 
at scales below the wavelength limit and in quantum metrology to achieve extremely accurate 
sensors. Applications include clock accuracy beyond that of current atomic clocks, which are 
limited by the quantum noise of atoms, optical resolution beyond the wavelength limit, ultrahigh 
resolution spectroscopy, and ultraweak absorption spectroscopy. 

13.8 Simulating Quantum Systems 

A major application of quantum computers is to the simulation of quantum systems. Long before 
we have quantum computers capable of simulating any quantum system, special-purpose quantum 
devices capable of simulating small quantum systems will be built. The simulations run on these 
special purpose devices will have applications in fields ranging from chemistry to biology to 
material science. They will also support the design and implementation of yet larger special 
purpose devices, a process that ideally leads all the way to the building of scalable general-purpose 
quantum computers. 

Early work on quantum simulation of quantum systems includes [285, 196, 289]. Somma et 
al.’s overview [257] discusses what types of physical problems simulation on quantum computers 
could solve. Clearly, a simulation cannot efficiently output the amplitudes of the state, as expressed 
in the standard basis, at all times, since even at just one point in time this information can be 
exponential in the size of the system. What is meant by a full simulation of a quantum system by 
a quantum computer is an algorithm that gives a measurement outcome with the same probability 
as an analogous measurement on the actual system no matter when or what measurement is 
performed. Even on a universal quantum computer, there are limits to what information can 
be gained from a simulation. For some quantities of interest, it is not obvious how to extract 
efficiently that information from a simulation; for some quantities there may be an information 
theoretic barrier, for others algorithmic advances are needed. 

Many quantum systems can be efficiently simulated classically. After all, we live in a quantum 
world but nevertheless have been able to use classical methods to simulate a wide variety of natural 
phenomena effectively. Some entangled quantum systems can be efficiently simulated classically 
[278]. The question of which quantum systems can be efficiently simulated classically remains 
open. New approaches to classical simulation of quantum systems continue to be developed, 
many benefiting from the quantum information processing viewpoint [249, 204], The quantum 
information processing viewpoint has also lead to improvements in a commonly used classical 
approach to simulating quantum systems, the DMRG approach [276]. 

While universal quantum computers will be able to simulate a wide variety of quantum sys¬ 
tems, they cannot efficiently simulate some theoretical quantum systems, systems that satisfy 
Schrodinger’s equation but have not been found in nature. They cannot simulate efficiently, even 
approximately, most quantum systems in the theoretical sense, abstract systems whose dynamics 
are described by e '' H for some Hamiltonian H. The proof of this fact follows directly from 
the fact most unitary operators are not efficiently implementable. It is conjectured [99], but not 
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known, that all physically realizable quantum systems are efficiently simulatable on a quantum 
computer. If it turns out that this conjecture is wrong and a natural phenomenon is discovered 
that is not efficiently simulatable on quantum computers as we have defined them, then we will 
have to revise our notion of a quantum computer to incorporate this phenomenon. But we would 
also have discovered an additional, potentially powerful, computational resource. 

13.9 Where Does the Power of Quantum Computation Come From? 

Entanglement is the most common answer given as to where the power of quantum computation 
comes from. Other common answers include quantum parallelism, the exponential size of the 
state space, and quantum Fourier transforms. Section 7.6 discussed the inadequacy of quantum 
parallelism and the size of the state space as answers. Quantum Fourier transforms, while central 
to most quantum algorithms, cannot be the answer in light of the result, mentioned in the reference 
section of chapter 7, that quantum Fourier transforms can be efficiently simulated classically. The 
rest of this section is devoted to explaining why the answer entanglement is also unsatisfactory, 
followed by a challenge to our readers to contribute to ongoing efforts to understand what Vlatko 
Vedral [274] terms “the elusive source of quantum effectiveness." 

One reason entanglement is so often cited as the source of quantum computing’s power is 
Jozsa and Linden’s result [167] that any pure state quantum algorithm achieving an exponential 
speedup over classical algorithms must make use of entanglement between a number of qubits 
that increases with the size of the input to the algorithm. In the same paper, however, Jozsa and 
Linden speculate that, in spite of this result, entanglement should not be viewed as the key resource 
for quantum computation. They suggest that similar results can be proved for other properties 
quite different from entanglement. For example, the Gottesman-Knill theorem, discussed in 
section 13.4.1, implies that states that do not have polynomially sized stabilizer descriptions are 
also essential for quantum computation. This property is distinct from entanglement. Since the 
Clifford group contains the C„ ot , this set of states includes certain entangled states. 

An analog of Jozsa and Linden’s result does not hold for less dramatic improvements over the 
classical case. In fact, improvements can be obtained with no entanglement whatsoever; Meyer 
[213] shows that in the course of the Bernstein-Vazirani algorithm, which achieves an n to 1 
reduction in the number of queries required, no qubits become entangled. More obviously, there 
exist other applications of quantum information processing that require no entanglement. For 
example, the BB84 quantum key distribution protocol makes no use of entanglement. Looking at 
the question from the opposite side, many entangled systems have been shown to be classically 
simulatable [278, 204], 

The cluster state model of quantum computation, on the other hand, suggests the centrality of 
entanglement to quantum computation. Other closely related models with other types of highly 
entangled initial states have been shown to be universal for quantum computation. While it was 
known that these states are, in some measures of entanglement, far from maximally entangled. 
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many researchers conjectured that in theory most classes of sufficiently entangled quantum states 
could be used as the basis of universal one-way quantum computation, but that finding mea¬ 
surement strategies for many of these classes might be prohibitively difficult. This conjecture, 
however, turns out to be false. 

Two groups of researchers [142, 64] showed that most quantum states are too entangled to be 
useful as a substrate for universal one-way quantum computation. For a few months, it was thought 
that perhaps these results would not apply to efficiently constructable quantum states, but Low 
[199] quickly exhibited classes of efficiently constructable quantum states that were too entangled 
to be useful as the basis for one-way quantum computation. Most of these states, however, are 
useful for quantum information processing applications such as quantum teleportation. 

These observations prompt two questions: what types of entanglement are useful, and for what. 
As mentioned in chapter 10, multipartite entanglement remains only poorly understood. Another 
intriguing challenge is to find a view of quantum information processing that makes obvious 
its limitations. For example, is there a vantage point from which the Q (VAQ lower bound on 
quantum algorithms for exhaustive search, proved in section 9.3, becomes a one-line observation? 
The route toward understanding what aspects of quantum mechanics are responsible for the 
power of quantum information processing is even less obvious. We hope readers of this book will 
contribute toward an improved understanding of these fundamental questions. 

13.10 What if Quantum Mechanics Is Not Quite Correct? 

Quantum mechanics may be wrong. Physicists have not yet understood how to reconcile quantum 
mechanics with general relativity. A complete physical theory would need to make modifications 
to one of general relativity or quantum mechanics, possibly both. Any modifications to quantum 
mechanics would have to be subtle, however; quantum mechanics is one of the most tested theories 
of all time, and its predictions hold to great accuracy. Most of the predictions of quantum mechan¬ 
ics will continue to hold, at least approximately, once a more complete theory is found. Since no 
one knows how to reconcile the two theories, no one knows what, if any, modifications would 
be necessary. Once the new physical theory is known, its computational power can be analyzed. 
In the meantime, theorists have looked at what computational power would be possible if certain 
changes in quantum mechanics were made. 

So far these changes imply greater computational power rather than less; computers built on 
those principles could do everything a quantum computer could do and substantially more. For 
example, Abrams and Lloyd [7] showed that if quantum mechanics were nonlinear, even slightly, 
computation using that nonlinearity could solve all problems in the class #P, a class that contains 
all NP problems and substantially more, in polynomial time. Aaronson [4] showed that if a cer¬ 
tain exponent in the axioms of quantum mechanics were anything other than 2, all PP problems, 
another class substantially larger than NP, would be solvable in polynomial time. These results 
mean that modification to quantum mechanics would not necessarily destroy the power obtained 
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by computers making use of these physical principles; in fact, in many cases it would increase 
the power. With these results in mind, Aaronson [5] suggests that limits on computational power 
should be considered a fundamental principle guiding our search for physical theories of the 
universe, much as is done for the laws of thermodynamics. 

Many intriguing questions as to the extent and source of the power of quantum computation 
remain, and they are likely to remain for many years while we humans struggle to understand 
what Nature allows us to compute efficiently and why. 



APPENDIXES 




A 


Some Relations Between Quantum Mechanics and Probability Theory 


The inherently probabilistic nature of quantum mechanics is well known, but the close rela¬ 
tionship between the formal structures underlying quantum mechanics and probability theory 
is surprisingly neglected. This appendix describes standard probability theory in a somewhat 
nonstandard way, in a language closer to the standard way of describing quantum mechanics. 
This rephrasing illuminates the parallels and differences between the two theories. Probability 
theory helps in understanding quantum mechanics, not only by placing structures such as tensor 
products in a more familiar context, but also because the mathematical formalisms underlying 
quantum theory can be precisely and usefully viewed as an extension of probability theory. This 
view clarifies relationships between quantum theory and probability theory, including differences 
between entanglement and classical correlation. 

A.1 Tensor Products in Probability Theory 

Tensor products are rarely mentioned in probability textbooks, but the tensor product is as much 
a part of probability theory as of quantum mechanics. The tensor product structure inherent in 
probability theory should be stressed more often; one of the sources of mistaken intuition about 
probabilities is a tendency to try to impose the more familiar direct product structure on what is 
actually a tensor product structure. 

Let A be a finite set of n elements. A probability distribution // on A is a function 
H : A -> [0, 1] 

such that '}Z aeA /-t(fl) = 1. The space V A of all probability distributions over A has dimension 
n — 1. We can view V A as the (n — l)-dimensional simplex cr„_i = {x e R" |x,- > 0, xi + X 2 + ■ ■ ■ 
+ x n — 1}, which is contained in the n -dimensional space R ' , the space of all functions from A 
to R, 

R A ={f :A-¥ R} 

(see figure A. 1). For n = 2, the simplex er„_i is the line segment from (1, 0) to (0, 1). Each vertex 
of the simplex corresponds to an element a e A in that it represents the probability distribution 



332 


Appendix A Some Relations Between Quantum Mechanics and Probability Theory 



Figure A.1 

Simplex 02 > which corresponds to the set of all probability distributions over a set A of three elements. 


that is 1 on a and 0 for all other elements of A. An arbitrary probability distribution n maps to 
the point in the simplex x = (p,(a 1 ), /x(r/ 2 ), • ■ ■, p,(a n )). 

Let B be a finite set of rn elements. Let Ax B be the Cartesian product Ax B = {{a, b)\a e 
A, b e B}. What is the relation between 7 >4xfl , the space of all probability distributions over 
AxB, and the spaces V A and V B 1 The tempting guess is not correct: V AxB ^ V A x V B . 
The following dimension check shows that this relationship cannot hold. First, consider the 
relationship between R 4x/i and R A and R B . Since AxB has cardinality | A x B\ = \A\\B\ = 
nm, R AxB has dimension nm, which is not equal to n + m, the dimension of R 4 x R /; . 
Since dim(V A ) = dim{ R A ) — 1, dim(V AxB ) = nm — 1, which is not equal to n + m — 2, the 
dimension of V A x V B . Thus, 

pAxB jyA x j,B 

Instead, R 4xtt is the tensor product R 4 <g> R 11 of R 4 and R 11 , and 'P AyH c R A ®R B . Before 
showing that this relationship holds, we give an example to help build intuition. 


Example A.1.1 Let Ao = {Oo, loh A! = {Oi, 1 1 }, and A 2 = {O 2 , I 2 }- Let lo and Oo correspond 
to whether or not the next person you meet is interested in quantum mechanics, A t to whether 
she knows the solution to the Monty Hall problem, and A 2 to whether she is at least 5'6" tall. So 
loll O 2 corresponds to someone under 5'6" who is interested in quantum mechanics and knows 
the solution to the Monty Hall problem. We often write 110 instead of lollO 2 ; the subscripts are 
implied by the position. Aprobability distribution over the set of eight possibilities, Ao x A] x Ai, 
has form 


P = (pooo, Pool, P 010 , Pou, P 100 , P 101 , F 110 , pi 11 ). 
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More generally, a probability distribution over Aq x Aj x • • • x 4, where the A, are all 2 element 
sets, is a vector of length 2 k . We always order the entries so that the binary subscripts increase. 
Thus, the dimension of the space of probability distributions over the Cartesian product of n 
two-element sets increases exponentially with n. 


This paragraph shows that R AxB = R ' ®R 8 . Given functions / : A -> R and g : B -* R, 
define the tensor product / ®g:AxB->Rby((i,fc)i-> f(a)g(b). The reader should check 
that this definition satisfies the axioms for a tensor product. Furthermore, the linear combination 
of functions in R AxB is a function in R Axfi . Thus R 4 <g> R B C R 4x B . Conversely, we must show 
that any function h e R AxB can be written as a linear combination of functions ® g, where 
fie R A and gi € R s . Define a family of functions / ; 4 e R A , one for each be B, by 

fb-A^ R 

a I —> h(a , b). 

Similarly, for each a e A, define 

ga ■ B R 

b /;(a, b). 

Furthermore, define the probability distributions 

S 4 : A -> R 

, 1 if a — a' 

a i—>• 

0 otherwise 

and 

8% : B -> R 

[1 if b = b' 

b \-e- 

0 otherwise. 

Then h(a,b) = J2a'eA S a'8a'(b)’ so h — J2 a 'eA 8 a’ ®8a" Therefore, he R A ®R B . For 
completeness, we mention that by symmetry h — fy ® 8 B . 

Now let us restrict our attention to probability distributions. If // and v are probability 
distributions, then so is /x <g> v: 

y, (pi®v){a,b)= n(a)v(b) 

(a,b)eAxB ( a,b)eAxB 

= EE ji(a)v(b) 



334 


Appendix A Some Relations Between Quantum Mechanics and Probability Theory 


Furthermore, the linear combination of probability distributions is a probability distribution as 
long as the linear factors sum to 1. Conversely, we show that any probability distribution r) e pAxH 
is the linear combination of tensor products of probability distributions in V A and V B with linear 
factors summing to 1. Define a family of probability distributions, one for each a e A, 

K ■ B -> R 

, n(a,b) 

b —-. 

L,b'eB *)( a ’ b ') 

Let c a = J2beB '?(«> h) and c b = '?(«- b). Observe that J^aeA c “ = L Then 

V(a,b)=J2 Ca'h B a ,8 A , 

a'eA 

= J2 C a' S a'® h a>- 

a'eA 

Since 8 A , is a probability distribution in V A , every probability distribution over A x B is in 
V A ®V B . 

Ajoint distribution p e , p A * B is independent or uncorrelated with respect to the decomposition 
V A ® T B if it can be written as a tensor product p A ® p B of distributions p A e V A and /j 8 e V B . 
The vast majority of joint distributions do not have this form, in which case they are correlated. 
For any joint distribution p e r p A * B , define a marginal distribution p A e V A by 

p A : a i —> p(a, b). 

beB 

An uncorrelated distribution is the tensor product of its marginals. Other distributions cannot be 
reconstructed from their marginals; information has been lost. One of the sources of mistaken 
intuition about probabilities is a tendency to try to impose the more familiar direct product struc¬ 
ture, which does support reconstruction, on what is actually a tensor product structure; the rela¬ 
tionship between a distribution and its marginals properly understood only within a tensor product 
structure. 

A distribution p : A —*■ R that is concentrated entirely at one element is said to be pure; on a 
set A of n elements there are exactly n pure distributions p a : A -> [0, 1], one for each element 
of A, where 

, 1 if a' — a 

' La 0 otherwise. 

These are exactly the distributions that correspond to the vertices of the simplex. All other 
distributions are said to be mixed. 

When an observation is made, the probability distribution is updated accordingly. All states 
incompatible with the observation are ruled out, and the remaining probabilities are normalized 
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to sum to 1. Much noise is made about the collapse of the state due to quantum measurement. But 
this collapse occurs in classical probability; it is known as updating a probability distribution in 
light of new information. 


Example A.1.2 Suppose your friend is about to toss two fair coins. The probability distribution 
for the four outcomes HH, HT, TH, and TT is pi — (1/4, 1/4, 1/4, 1/4). After she tosses the 
two coins, she tells you that the two coins agreed. To compute the new probability distributions, 
the possibilities compatible with your friend’s observation, HT and TH, are ruled out, and the 
remaining possibilities are normalized to sum to 1, resulting in the probability distribution p F — 
( 1 / 2 , 0 , 0 , 1 / 2 ). 


Example A.1.3 Let us return to the example of the traits for the next person you meet. Unless 
you know all of these traits, the distribution pi — (/?ooo, • • •, Pill) is a mixed distribution. When 
you meet the person you can observe her traits. Once you have made these observations, the 
distribution collapses to a pure distribution. For example, if the person is interested in quantum 
mechanics, does not know the solution to the Monty Hall problem, and is 5'8", the collapsed 
distribution is p F — (0, 0, 0, 0, 0, 1, 0, 0). 


The true surprise in quantum mechanics is that quantum states cannot generally be modeled 
by probability distributions — the content of Bell’s theorem. Overly simplified versions of the 
EPR paradox, in which only one basis is considered, reduce to an unsurprising classical result 
that instant, faster-than-light knowledge of a faraway state may be possible upon the observation 
of a local state. 


Example A.1.4 Suppose someone prepares two sealed envelopes with identical pieces of paper 
and sends them to opposite sides of the universe. Half the time, both envelopes contain 0; half 
the time, 1. The initial distribution is pi = (1/2, 0, 0, 1/2). If someone then opens one of the 
envelopes and observes a 0, the state of the contents of the other envelope is immediately known 
— known faster than light can travel between the envelopes — and the distribution after the 
observation is p F = (1, 0, 0, 0). 


To understand fully the relationship between quantum mechanics and probability theory, it is 
useful to view probability distributions as operators. Consider the set of linear operators A4 a — 
{M : R A -> R A }. To every function / : A —»■ R, there is an associated operator Mf : R 1 —» 
R a given by Mf : g fg. In particular, for any probability distribution // on A, there is an 
associated operator M tl : R 1 -> R A . An operator M is said to be a projector if M 2 — M. The set 
of probability distributions p. whose corresponding operators M /t are projectors is exactly the set 
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of pure distributions. The matrix for the operator corresponding to a function is always diagonal. 
For a probability distribution, this matrix is trace 1 as well as diagonal. For example, the operator 
corresponding to the probability distribution p) — (1/2, 0, 0, 1/2) has matrix 

/ 1/2 0 0 0 \ 

0 0 0 0 

0 0 0 0 

0 001 / 2 / 

Updating the probability distribution with information from an observation involves setting some 
of the matrix entries to zero and renormalizing the diagonal to sum to 1. 


Example A.1.5 

The matrix for the initial probability distribution in example A. 1.2 is 

/ 1/4 0 

0 0 \ 


0 

1/4 

0 0 


0 

0 

1/4 0 


V 0 

0 

0 1/4 ) 



The matrix for the updated probability distribution after the measurement involves setting the 
probabilities of HT and TH to 0 and renormalizing the matrix to obtain a trace 1 matrix: 

/ 1/2 0 0 0 \ 

0 0 0 0 

0 0 0 0 

0 001 / 2 / 


Example A.1.6 

The matrix for the initial probability distribution in example A. 1.3 is 

/ 1/2 

0 0 

° ^ 


0 

0 0 

0 


0 

0 0 

0 


V 0 

0 0 

1/2 j 



The matrix for the updated probability distribution after the envelope has been opened involves 
setting the probability of both envelopes containing 1 to 0 and renormalizing the matrix to obtain 
a trace 1: 

/ 1 0 0 0 \ 

0 0 0 0 

0 0 0 0 ’ 

\ 0 0 0 0 / 
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A.2 Quantum Mechanics as a Generalization of Probability Theory 

The remainder of this appendix relies on the density operator formalism and the notions of pure 
and mixed quantum from section 10.1. This section describes how pure and mixed quantum 
states generalize the classical notion of pure and mixed probability distributions. This viewpoint 
helps clarify the distinction between quantum entanglement and classical correlations in mixed 
quantum states. 

Let p be a density operator. Section 10.1.2 showed that every density operator p can be written 
as a probability distribution over pure quantum states JT p, \ \pj) (\pi \, where the | \p ,} are mutually 
orthogonal eigenvectors of p, and the p, are the eigenvalues, with p, e [0, 1] and ]L p, — 1. Con¬ 
versely, any probability distribution /rover a set of orthogonal quantum states \ipi), [xpi), ..., \xPl) 
with p : | xpi } —> pi has a corresponding density operator p^ = JT Pi I Vh) (Vh I • In the basis {| ip ,)}, 
the density operator p^ is diagonal: 

< Pi > 

P2 

V Pl ) 

Thus, a probability distribution over a set of orthonormal quantum states {| ip,)} can be viewed as 
a trace 1 diagonal matrix acting on R 1 . Under the isomorphism between R L and the subspace of 

V generated by \\pi), IVcK • • ■, |i Pl)- the density operator p,, realizes the operator M ,, of section 
A.l; a probability distribution over a set of orthonormal quantum states {|i/r,}} can be viewed as 
a trace 1 diagonal matrix acting on R L . In this way, density operators are a direct generalization 
of probability distributions. 

Although every density operator can be viewed as a probability distribution over a set of 
orthogonal quantum states, this representation is not unique in general. More importantly, for 
most pairs of density operators p\ and P 2 , there is no basis over which both p\ and P 2 are 
diagonal. Thus, although each density operator of dimension N can be viewed as a probability 
distribution over N states, the space of all density operators is much larger than the space of 
probability distributions over N states; the space of all density operators contains many different 
overlapping copies of the space of probability distributions over N states, one for each orthonormal 
basis. 

Let p : V -* V be a density operator. By exercise 10.4 a density operator p corresponds to 
a pure state if and only if it is a projector. This statement is analogous to that for probability 
distributions; the pure states correspond exactly to rank 1 density operators, and mixed states 
have rank greater than 1. As explained in section 10.3, density operators are also used to model 
probability distributions over pure states, particularly probability distributions over the possible 
outcomes of a measurement yet to be performed. This use is analogous to the classical use of 
probability distributions to model the probabilities of possible traits before they can be observed. 
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A pure quantum state \\jr) is entangled with respect to the tensor decomposition into single 
qubits if it cannot be written as the tensor product of single-qubit states. For a mixed quantum 
state, it is important to determine if all of its correlation comes from being a mixture in the classical 
sense or if it is also correlated in a quantum fashion. A mixed quantum state p\V(&W->V(&W 
is said to be uncorrelated with respect to the decomposition V ® W if p — pv ® pw for some 
density operators pv : V —> V and pw : W —»• W. Otherwise p is said to be correlated. A mixed 
quantum state p is said to be separable if it can be written p — Yij=i Pj\tl/J){icJ | <g> \4>J)((pJ\ 
where | \f/J } e V and | <pj) e W. In other words, p is separable if all the correlation comes from 
its being a classical mixture of uncorrelated quantum states. If a mixed state p is not separable, it is 
entangled. For example, the mixed state p cc = ^(|00)(00|) + (|11)(11|) is classically correlated 
but not entangled, whereas the Bell state |<t> + )(<!> + | = 5 (|00) + |11))((00| + (11|) is entangled. 
The marginals of a pure distribution are always pure, but the analogous statement is not true for 
quantum states; all of the partial traces of a pure state are pure only if the original pure state 
was not entangled. The partial traces of the Bell state | < t >+ ), a pure state, are not pure. Most pure 
quantum states are entangled, exhibiting quantum correlations with no classical analog. All pure 
probability distributions are completely uncorrelated. 

Classical and quantum analogs: 


Classical probability 

Quantum mechanics 

probability distribution p, viewed as operator M jL 

density operator p 

pure distribution: M it is a projector 

pure state: p is a projector 

simplex: er„_ 1 = {x e R”|x ( - > 0, x\ -|-h 

x n = 1 } 

Bloch region: set of trace 1 positive Hermitian 
operators 

marginal distribution 

partial trace 

A distribution is uncorrelated if it is the tensor 
product of its marginals 

A state is uncorrelated if it is the tensor 
product of its partial traces 


Key difference: 


Classical 

Quantum 

pure distributions are always uncorrelated 

pure states contain no classical correlation but 
can be entangled 

A marginal of a pure distribution is a pure 
distribution 

The partial trace of a pure state may be a mixed 

state 
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A.3 References 

The view of quantum mechanics as an extension of probability theory is discussed in many 
quantum mechanics references, particularly those concerned with the deeper mathematical aspects 
of the theory. Aaronson gives a playful account [3], Rieffel treats this subject in [241], In their 
chapter on quantum probability, Kitaev et al. outline parallels between quantum mechanics and 
probability theory [175]. Kuperberg’s A Concise Introduction to Quantum Probability, Quantum 
Mechanics, and Quantum Computation also serves as an excellent reference [188]. Sudbery 
[267] gives a brief account in his section of Statistical Formulations of Classical and Quantum 
Mechanics. An early account of some of these ideas can be found in Mackey’s Mathematical 
Foundations of Quantum Mechanics. Strocchi’s An Introduction to the Mathematical Structure 
of Quantum Mechanics gives a detailed and readable account [266]. A number of papers by 
Summers, including [236], address relations and distinctions between quantum mechanics and 
probability theory. 

A.4 Exercises 

Exercise A.1 . Show that an independent joint distribution is the tensor product of its marginals. 

Exercise A.2 . Show that a general distribution cannot be reconstructed from its marginals. Exhibit 
three distinct distributions with the same marginals. 

Exercise A.3. 

a. Show that the tensor product of a pure distribution is pure. 

b. Show that any distribution is a linear combination of pure distributions. Conclude that the set 
of distributions on a finite set A is convex. 

c. Show that any pure distribution on a joint system A x B is uncorrelated. 

d. A distribution is said to be extremal if it cannot be written as a linear combination of other 
distributions. Show that the extremal distributions are exactly the pure distributions. 

Exercise A.4. Show that the probability distributions p whose corresponding operators M t , are 
projectors are exactly the pure distributions. 

Exercise A.5. For each of the states |0), |—>, and |i) = J^(|0> +i|l)), give the matrix for the 
corresponding density operator in the standard basis, and write each of these states as a probability 
distribution over pure states. For which of these states is this distribution unique? 

Exercise A.6. 

a. Give an example of three density operators no two of which can be simultaneously diagonalized 
in that there does not exist a basis with respect to which both are diagonal. 

b. Show that if a set of density operators commute, then they can be simultaneously diagonalized. 
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Exercise A.7. Show that the binary operator f ® g : (a, b) i-> f(a)g(b) for / e R ' and g e R /; 
satisfies the relations defining a tensor product structure on R AxB given in section 3.1.2. 

Exercise A.8. Show that a separable pure state must be uncorrelated. 

Exercise A.9. Show that if a density operator p e V ® W is imcorrelated with respect to the 
tensor decomposition V ® W, then it is the tensor product of its partial traces with respect to V 
and W. 
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Solving the Abelian Hidden Subgroup Problem 


This appendix covers the solution to the Abelian hidden subgroup problem using a generalization 
of Shor’s factoring algorithm. Recall from box 8.4 that any finite Abelian group can be written 
as the product of cyclic groups. 

Finite Abelian Hidden Subgroup Problem Let G be a finite Abelian group with cyclic decompo¬ 
sition G = Z„ 0 x • • • x Z„ L . Suppose G contains a subgroup H < G that is implicitly defined by 
a function f on G in that / is constant and distinct on every coset of H. Find a set of generators 
for H. 

This appendix shows that, for finite Abelian groups, if 
U f :\g)\0)^ \g)\f(g)) 

can be computed in poly-log time, then generators for H can be computed in poly-log time. 

This appendix makes use of deeper aspects of group theory, such as group representa¬ 
tions, than the rest of the book. Basic elements of group theory were reviewed in the boxes 
accompanying section 8.6. Section B.l reviews group representations of finite Abelian groups, 
including Schur’s lemma. Section B.2 defines quantum Fourier transforms over finite Abelian 
groups. Section B.3 explains how these quantum Fourier transforms enable the solution of the 
Abelian hidden subgroup problem. Section B.4 looks at Simon’s problem and Shor’s factoring 
algorithm as instances of this general solution to the Abelian hidden subgroup problem. The 
appendix concludes in section B.5 with a few remarks on the non-Abelian hidden subgroup 
problem. 

B.l Representations of Finite Abelian Groups 

A representation of an Abelian group G is a group homomorphism y from G to the multiplicative 
group of complex numbers C: 
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More generally, representations of groups are group homomorphisms into the space of linear 
operators on a vector space. However, in the Abelian case it suffices to consider only characters, 
the representations into the multiplicative group of complex numbers. 

For the additive group Z„, the homomorphism condition implies that any representation / 
of Z„ must send 0 f—>- 1, and the generator 1 of Z„ must map to one of the n roots of unity 
since 

n 

E* = ° 

i =1 

implies 

n 

nx( i )=x(°)=i. 

! = 1 

Since x (1) determines the image of all other elements in Z„ there can be at most n representations. 
Any nth root of unity works, so the n representations Xj 

( 2iri 

Xj : x h* exp - jx 

\ n 

for all j e Z„ form the complete set of representations of Z„. Many of the representations are 
not one-to-one: for example the trivial representation that we have labeled by 0 e Z„ sends all 
group elements to 1. We have labeled the representations by group elements j e Z„ in one way. 
We use this labeling as our standard labeling throughout this appendix. Other labelings by group 
elements are possible. 

More generally, for any Abelian group, the homomorphism condition y (gh) = y (g )y{h) 
implies that y(e) = 1 , y(g~ l ) = x(g)> and that every y(g) is a kth root of unity, where 
k is the order of g. An Abelian group of order |G| has exactly [G| distinct representa¬ 
tions Xi- 


Example B.1.1 The two representations for Zi are XiU) — “l’ 7 or 
XoU) = 1 


XiU) = 


1 if JC = 0 
— 1 if x = 1. 


Example B.1.2 The four representations x;0) = exp(2jri j) of Z 4 are given in the following 
table: 
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0 

1 

2 

3 

Xo 

1 

1 

1 

1 

Xi 

1 

i 

-1 

-i 

X2 

1 

-1 

1 

-1 

X3 

1 

-i 

-1 

i 


The representations of a product Z„ x Z,„ can be defined in terms of the representation of 
each of its factors. Let x.i be the n different representations of Z„ and x) be the m different 
representations of Z,„. Then 

Xij((g,h )) = X,(g)Xj(h) 

are all nm distinct representations of Z„ x Z,„. We have labeled these representations by group 
elements (i, j) e Z„ x Z m . 


Example B.1. 3 The 2" representations of Z" have a particularly nice form. If we write each 
element b of Z" as b — ( bo , b\,..., b n _ i), where each Z?,- is a binary variable, then the group 
representation Xb is the n -way product of the two representations xo and xi for Zo, 

Xb(a) = Xb 0 (ao),Xb n _i(a n - 1) = (— l) a *, 

where a ■ b is the standard dot product of the vectors a and h. 


Since any finite Abelian group is isomorphic to a finite product Z „ 0 x • • • x Z„ k of cyclic 
groups, the definition of x/, together with the result about representations for product groups, 
provides an effective way to construct all of the representations for any finite Abelian group. 
These representation may be labeled by group elements as before. 

For Abelian groups, the set of representations itself forms a group denoted by G where 

• the representation x (g) = 1 for all g e G is the identity, 

• the product x = X/ ° Xj °f two representations x; and X j defined by x ( g ) = Xi ( g)Xj ( g) for all 
g <= G is itself a representation, and 

• the inverse of any representation x is defined by 

x _ 1 (g) = 1 /X(g) = X(g) 
for all g e G. 

For a subgroup H < G, let // 1 ={ge G\x g (h) = l.V/z e H}. Since G is Abelian, the 
set of cosets of H in G forms a group G/H , the quotient group of G modulo H, of order 
[G : H] = \G\/\H\. The [G : H] representations of G/H are in one-to-one correspondence with 
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representations of G that map all elements of H to 1. Thus, there are exactly [G : H] represen¬ 
tations in H ± . The set H 1 forms a group that has representations in its own right. Since // 1 
has size [G : H], there are exactly [G : H] distinct representations of the group H L . An element 
g' e G acts as a representation x g ' of H L in the following way: 

C 

X* I-+ Xg(g')- 

Not all of these representations are distinct, however. All h e H act as the trivial representation 
on H l : 

h : H l -* C 

X g i-+ Xg(h) = 1. 

The group H ±± — {g' e G\x g '(g ) = 1, Vg e H ± } has size \G\/[G : H] — \H\. By definition of 
H l and Xg', 

— {g r e G|g'(xg) = 1, Vg e H^} 

= {/eG|xg(g') = l,VgeH x }. 

Thus, all elements of H are contained in H 1 . Since II 1 ' — \ H\, 

H ±A - = H. 

Chapter 11 discusses groups C that are classical error correcting codes. The dual group C 1 to a 
classical code C is defined in the way we just discussed. Classical codes and their duals form the 
basis for the construction of the quantum CSS codes discussed in section 11.3. 


Example B.1.4 Any subgroup // of G — Z" is isomorphic to Z* for some k. Since there are 
[G : H] = 2 n ~ k elements of H L < G, // 1 is isomorphic to Z" k . Using the expression for 
the representations of Z" of example B.1.3, the elements of // 1 are the elements b such that 
Xb(a) = (-l) a b = 1 for all a e H. Thus, H L = {b\a ■ b = 0mod2, Wa e H). 


To define the quantum Fourier transform for a general Abelian group, we need a technical 
result, Schur’s lemma, that is a generalization of identity 11.7 for the Walsh-Hadamard trans¬ 
formation. 

B.1.1 Schur's Lemma 

Schur's lemma Let x; and x , be representations of an Abelian group G. Then, 


XI r Xi(g)Xi(g) = \G\, 

t—'geG 
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and 

J2 geC Xi(g)Xj(g) = 0 for Xi ^ Xj- 

The first case follows by observing that axa = 1 for any root of unity. For i ^ j, 

Xi (h) X! Xi ( g)Xj(g) = ^2 Xi (*)Xi (g)Xj (. g) 

geG geG 

= ^2 Xi(hg)Xj(h~ 1 hg) 
geG 

= ^2xi(g)Xj(h~ 1 g) 
geG 

= Y^Xi(g)Xj(h)Xj(g) 

geG 

= Xj(h) ^2 Xi(g)Xjig)- 

geG 

Since Xi(h) / XjW for some h, it follows that J2 g€ c Xiig)Xj(g) = 0. 

If we think of /,■ as a complex vector of n elements (/,- (go), ■ ■ ■ Xi (gn- 1 )), then Schur’s lemma 
says that /, has length |G| and any two different vectors /, and Xj are orthogonal. 

Schur's lemma for subgroups A simple corollary of Schur’s lemma holds for representations / 
of G restricted to subgroups: 

\H\ if X (h) = l,V/i e H 
0 otherwise. 

Since any representation x of G is a representation of H when restricted to //, we can apply 
Schur’s lemma directly to / viewed as a representation of H to obtain this equality. 

B.2 Quantum Fourier Transforms for Finite Abelian Groups 

This section defines quantum Fourier transforms over finite Abelian groups. Section B.2.1 defines 
the Fourier basis for an Abelian group. This basis is used in the definition of the quantum Fourier 
transform over an Abelian group given in section B.2.2. 

B.2.1 The Fourier Basis of an Abelian Group 

To an Abelian group G with |G| = n, we associate an n -dimensional complex vector space V 
by labeling a basis for the vector space with the n elements of the group {|go), • • ■ |g„_i)}. The 
Fourier transform of section 7.8 takes elements of this basis to another, the Fourier basis. As 
the first step to generalizing the Fourier transform to general Abelian groups, this section defines 


^ 2x ( h ) = 
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the Fourier basis for V corresponding to the basis {|go), • •. |g„_i}}. The Fourier basis is defined 
in terms of the set of group representations Xg of G. 

A group G acts in a natural way upon itself: for every group element g e G, there is a map 
from G to G that sends a to ga for all elements a e G. This map can be viewed as a unitary 
transform 71, acting on V that takes 


I ga). 


The transformation T g is unitary for any g because it is reversible, 7’,, i T g — /, and maps basis 
states to basis states. 

The Fourier basis of G with respect to a particular labeling Xg of the representations of G 
consists of all {\e k )\k e G}, where 


k*> = 


1 


5 ~2xk(g)\g >• 

geG 


From Schur’s lemma and the fact that (g'\g) = 0 for g ^ g', it is easy to see that this set forms a 
basis, since 


(ej\e k ) = —- ( ^ Xj(g')(g'\ ) 


U'eG 




Xj(g')Xk(g){g'\g) 


g'eG g€G 


1 x - - 

= J77T 2^Xj{g)Xkig) 


geG 


= 8 jk- 


For each k e G, the vector \e k ) is an eigenvector of 7} 

Tj\e k ) = —^=Yl,Xk(g)Tj\g) 

= 7m 5 z ‘ wl ' s) 

1 x .--- 

= —7= 2 _^Xk(j [ )xk(jg)\jg > 

V 1^1 geG 

= Xk(j)^== Y.x k (h)\h) 

VlG| heG 

= x*0‘)k*)- 


| h) i-> | jh) with eigenvalue Xk(j)' 
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Example B.2.1 The Fourier basis for Z 2 is 

ko> = ^(3to(0)|o> + 3to(i)|i» = ^(|o> + ID) 
kt> = ^Oo(O)IO) + x 1 (D|i» = ^(|o> - ID). 


Similarly, for Z, 

4 we get 





ko) = 

1 y^ 3 

2 Z^i=0 

Xo(0|i) = 

2 (|0> 

+ |1> + 

■|2} + 

|3» 

ki> = 

1 w 3 

2 2-1 i= 0 

Xt(0|i) = 

2(10} 

— i 1} - 

-|2>4 

-i|3» 

k2) = 

I y-3 

X2(0|i> = 

|(|0> 

-|1> + 

' |2) — 

|3» 

k3) = 

1 v 3 

2 2-ii =0 

X3(0|i> = 

2(10} 

+ ' 1} - 

- 2} - 

-i|3» 


B.2.2 The Quantum Fourier Transform Over a Finite Abelian Group 

The quantum Fourier transform for an Abelian group G is the transformation T that maps \e g 
to |g), 

^=E i*x g *i- 

geG 


Consider the effect of T on a group element \h). With 


ith (e k \ = - 7 = £ eG Xk(g)(g I we get 

Vl G l “ 


( e k\ h ) = -7= E Xk(g){g\h) = ~^=Xk(h) 

V\ G \ g eG VIGI 

and thus, 

?\h) = E \ 8 ){e g W = ~^= E Xg(h)\g). 

geG V 1^1 geG 

It follows that the matrix for T in the standard basis has entries 

Xe(h) 

^ = ww = fa 

The inverse Fourier transform is 

-^ _1 = E i^x^i- 

geG 

With ' r -' 


ith T 1 \h) = \en) = J2 g eG Xh(g)\g)> the matrix for T 1 in the standard basis has entries 


— l _ Xh(g ) 

“ ~7i§r 
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Suppose that Tq and Tn are Fourier transforms for G and H, respectively. If the elements 
(g, h) e G x H are encoded as \g)\h), then Ta x h = Fg <S> Fh is a Fourier transform for G x H. 


Example B.2.2 The Hadamard transformation H is the Fourier transform for Z 2 : 


Xo(0) xod) Li/1 1 \ 
V 2 \ Xt(0) Xi (1) / V 2 V 1 -! / 


The A'-bit Walsh-Hadamard transform W is the Fourier transform for Z|. In standard labeling, the 
representations for Z/ are of the form y, (/) = (—l)''-'. For instance, for Fourier transform 
for Z 2 x Z 2 is 


/ 1 

•^2x2 = H <g> H = - | 

V 1 


1 1 

-1 1 

1 -1 

-1 -1 


1 \ 
-1 
-1 

1 


By comparison, Ta, the Fourier transform for Z 4 is 



/ 


i° 

i° 

i° 

\ 


/ 

1 

1 

1 

1 \ 

1 


i° 

i 1 

i 2 

i 3 


1 


1 

i 

-1 

-i 

2 


i° 

i 2 

i 4 

i 6 


- 2 


1 

-1 

1 

-1 



i° 

i 3 

i 6 

i 9 

) 



1 

-i -1 

i / 


The quantum Fourier transform can be defined for non-Abelian groups as well. The definition 
is in terms of group representations, but the set of representations for non-Abelian groups is much 
more complicated than for the Abelian case. All of these quantum Fourier transforms have efficient 
implementations. Even in the Abelian case, some of the implementations are simpler than others. 
One useful property is that if U\ and Uo are two quantum algorithms implementing the quantum 
Fourier transforms for groups G 1 and G 2 respectively, then U\ <E) U 2 implements the quantum 
Fourier transform for G 1 x Gi. Section 7.8 gave an 0(n 2 ) implementation for quantum Fourier 
transforms over the groups Z 2 «. Section B .6 gives pointers to papers on efficient implementations 
for quantum Fourier transforms over other groups. We now turn to the use of quantum Fourier 
transforms in solving the hidden subgroup problem for Abelian groups. 

B.3 General Solution to the Finite Abelian Hidden Subgroup Problem 

This section explains how to solve the finite Abelian hidden subgroup problem. Suppose a group 
G, with cyclic decomposition G = Z„ 0 x • • • x Z„ L , contains a subgroup H < G that is implicitly 
defined by a function / : G — > G in that / is constant and distinct on every coset of H. Suppose 
further that U / can be computed in polylogarithmic time with respect to the size of the group G. 
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This section shows how, with high probability, generators for H can be found in polylogarithmic 
time. 

A general procedure used to solve the Abelian hidden subgroup problem consists of four steps 
followed by a final measurement. This procedure is repeated a number of times that depends on 
the desired level of certainty 1 — e. 


initialization: 


vTc! 'f - l0> 


geG 


u ,: *>!/*» 


1 , 

measurement: —= > | gh) 

V\H\ heH 


T a \ 


7^S X '®(S Z ‘ < ‘ ) ) 


ls>- 


A measurement of this state returns with equal probability age H L such that x K (h ) = 1 for all 
h e H. 

We now go through these steps in more detail. After computing U/ on the superposition of all 
group elements. 


Uf 




vTgtS' S>I/<S)) ' 


a measurement of the second register randomly yields a single fig) for some geG. Since 
/(g) = f(gh ) for all h e H, and by assumption / is different on every coset, /(g) is the value 
of / on all elements of the coset gH and on no others. After this measurement, we have the state 


|/> 


1 

VW\ 


I gh), 

heH 


a superposition over only elements of the coset gH. Each coset is equally likely to be the result 
of this measurement, so measuring i//) at this point yields a random element geG with equal 
probability. The key insight is that the Fourier transform of the state [f] eliminates the constant 
g and allows us to extract information about H. 

The state \ ifr) is the image of the state -J= | h) under the transformation 


T- q : G 


G 


Tg : Is) Iss). 
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Apply the quantum Fourier transform to | t/r}: 


1 
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Schur’s lemma for subgroups says XgW ^ 0 if and only if XgW = 1 for all h € H. It 

follows that measuring this state returns the index g of some representation Xg that is constant 
(1) on H. For product groups G — Go x • • • x G/, the element g of // 1 returned is in the form 
g = (go,gi,---,gL), where g, is an element of G,-. 

To obtain a complete set of generators for H L , we repeat the preceding algorithm a number 
of times that depends on our desired level of certainty 1 — e. If the first n — 1 group elements 
returned do not yet generate all of H ± , the next run through the algorithm has at least a 50 percent 
chance of returning an element of H ± not generated by the previous elements, since any proper 
subgroup has index at least 2 in the whole group. Thus, by repeating this procedure the appropriate 
number of times, we can obtain any desired level of certainty 1 — e. 

We have now completed the quantum part of the solution. From a sufficient number of elements 
of /-/ 1 , classical methods can efficiently find a full set of generators for H. 


B.4 Instances of the Abelian Hidden Subgroup Problem 


B.4.1 Simon's Problem 

Simon’s problem works with the group G — Z" that has representations / A (>’) = (—1)* v - The 
function 

fig® a) = fig ) 

defines a subgroup 
A = {0 ,a}. 

The measurement at the end of one run through the four-step procedure for solving Abelian hidden 
subgroup problems returns an element 

Xj e A 1 = {x\{-\y y = 1 for all y e A}. 

The element xj must satisfy Xj ■ y = 0mod2 for all y € A. With sufficiently many values Xj, we 
can solve for a. In this problem, we know that we have found a solution when there is a unique 
non-zero solution for a. 



B.6 References 


351 


B.4.2 Shor's Algorithm: Finding the Period of a Function 

For simplicity, assume that r divides n (see section 8.2.1 for the general case) and work with the 
group 

G — Z„. 


The periodic function / has the property 
f(x + r) = /(*), 
which defines the subgroup 
H = {kr\k e [0,... n/r)}. 


The problem is to find the generator r of the subgroup. In the standard labeling of representations 
for Z„, 

X g (h) = exp (liti 


and 


// x = {.r | exp 



1 for all h e H) 


— {x\xkr — Omodn for all k e [0,... n/r)}. 

Measurement after one round of the four-step procedure yields x e H L . The element x satisfies 
xkr — Omodn for all k e [0,... n/r). In particular, xr = Omodn, so .r is a multiple of n/r. The 
period r can now be computed as in section 8.2.1. 


B.5 Comments on the Non-Abelian Hidden Subgroup Problem 

No one knows how to solve the general hidden subgroup problem. Quantum Fourier transforms 
can be defined over non-Abelian groups. In fact, efficient implementations of quantum Fourier 
transforms over all finite groups are known. It is not known, however, how to use the quantum 
Fourier transformation to extract information about the generators of hidden subgroups for most 
non-Abelian groups. Worse still, researchers have proved that Fourier sampling, a general tech¬ 
nique based on Shor’s technique, cannot be used to solve the general hidden subgroup problem. 
Section 13.1 briefly describes more recent progress in understanding quantum approaches to the 
non-Abelian hidden subgroup problem. 


B.6 References 


Kitaev [172] presents a solution for the Abelian stabilizer problem and relates it to factoring 
and discrete logarithms. The general hidden subgroup problem as presented in this appendix and 



352 


Appendix B Solving the Abelian Hidden Subgroup Problem 


its solution were introduced by Mosca and Ekert [214], Ekert and Jozsa [112] and Jozsa [165] 
analyze the quantum Fourier transform in the context of the hidden subgroup problem. Hallgren 
[148] studies extensions to the non-Abelian case. Grigni et al. [141] showed in 2001 that for 
most non-Abelian groups, Fourier sampling yields only exponentially little information about the 
hidden subgroup. 

B.7 Exercises 

Exercise B.l. Let G and H be finite graphs. A map / : G —> H is a graph isomorphism if it 
is one-to-one and f(gi) and f(g 2 ) have an edge between them if and only if g\ and g 2 do. An 
automorphism of G is a graph isomorphism from G to itself, / : G —> G. A graph automorphism 
of G is a permutation of its vertices. The graph isomorphism problem is to find an efficient 
algorithm for determining whether there is an isomorphism between two graphs or not. 

a. Show that the set Aut(G) of automorphisms of a graph G forms a group, a subgroup of the 
permutation group S „, where n = \ G \. 

b. Two graphs Gi and G2 are isomorphic if there exists at least one automorphism in Aut(Gi U 
G 2) < .S' 2 " that maps nodes of G\ to G 2 and vice versa. Show that if G\ and G 2 are nonisomorphic 
connected graphs, then Aut(Gi U G{) — Aut(Gi) x AutfGa). 

c. Show that if Aut(Gi U G2) is strictly bigger than Aut(Gi) x Aut(G2), then there must be an 
element of Aut(Gi U G2) that swaps G1 and G2. 

d. Express the graph isomorphism problem as a hidden subgroup problem. 

Exercise B.2. Write out the algorithm that solves Simon’s problem using the hidden subgroup 
framework of section B.3. 

Exercise B.3. Write out an algorithm that finds the period of a function using the hidden subgroup 
framework of section B.3. 

Exercise B.4. Find an efficient algorithm that solves the discrete logarithm problem. 
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Standard Notation 

\x\ absolute value 

[x..y] closed interval 

~ approximately equal 

e 2.718281... 

i 

exp(x) e x 

log logarithm base e 

log m logarithm base m 

v traditional vector notation 

\ T transpose of a vector or matrix 

atj element i,j of matrix A 

det A determinant of A 

k generic eigenvalue 

f/ _1 inverse of a unitary transformation, quantum algorithm 

U t conjugate transpose 

C the complex numbers 

R the real numbers 

R n n dimensional real space 

Z the natural numbers 

Z 2 the natural numbers modulo 2 

Z” group of ft-bit strings under bitwise addition modulo 2 

| G | order of a group 

G the group of representations of G 

X group homomorphism 

H < G subgroup relation 

o generic group operation 

= isomorphism 

Z(S ) the centralizer of subgroup S 



366 


Notation Index 


General Concepts 




Page 

Section 
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bit-wise exclusive or operation 

p- 

100 

6.1 

a 

complex conjugate 

p- 

14 

2.2 

£ 

a set of symbols (alphabet) 

p- 

148 

7.7 

£* 

the set of words over alphabet E 

p- 

148 

7.7 

O (/(»)) 

measures of complexity 

p- 

107 

6.2.2 

S2 (/(«)) 


p- 

107 

6.2.2 

©(/(«)) 
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107 
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Linear Algebra 




Vectors 

M 

quantum state vector labeled v 

p- 

14 

2.2 

INI 

length or norm of a vector 

p- 

14 
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<w| 

conjugate transpose of |u) 

p- 

15 

2.2 

(a|6> 

inner product of (a\ and | b) 

p- 
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2.2 

|a>(&l 

outer product of ( a\ and | b) 

p- 

47 

4.2 


the label for |jc) in the code space 

p- 
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11.1.1 

_L 

used as superscript, signifying orthogonality 

p- 

16 

2.3 

<S> 

right Kronecker product 

p- 

33 

3.1 

a • 

inner product on bit vectors 

p- 
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2.2 


or binary vector/matrix multiplication sometimes 

p- 

127 

7.1.1 


scalar multiplication 
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3.1.1 

c) 

Hamming distance 
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7.1 

d H (x) 

Hamming weight 

p- 
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7.1 

Matrices 





IIAHtv 

trace norm of a matrix or operator 

p- 

309 

12.3.2 

(A\B) 

composition of two matrices 

p- 
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11.2.5 

/(*) 

2 k x 2 k identity matrix 

p- 

81 
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D (k) 

2 k x 2 k diagonal matrix 

p- 
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7.8.1 

drAp. p') 

trace metric 

p- 
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12.3.2 

*ii 

Kronecker delta 

p- 

14 

2.2 

|A| 

positive square root V A^A 

p- 
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12.3.2 

tr 

trace 

p- 
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10.1.1 

tr A 

partial trace 

p- 
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10.1.1 

Quantum States 




Transformations, Operators 

A Q controlled transformation Q 

p- 

78 

5.2.4 

AkQ 

Q controlled by k control qubits 

p- 

87 

5.4.3 

A iO 

single qubit transformation Q, controlled by a pattern 

p- 

88 

5.4.3 


unary transformation for (classical) function / 

p- 

100 

6.1 

H 

Hadamard transformation 

p- 
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w 

Walsh transformation 

p. 128 

7.1.1 

K(S) 

a phase shift by 8 

p. 84 

5.4.1 

R(P) 

a rotation by (5 

p. 84 

5.4.1 

T(a) 

a phase rotation by a 

p. 84 

5.4.1 

X, Y, Z, I 

elements of the Pauli group 

p. 75 

5.2.1 

&X, cry, az 
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